Disk diagnostics page

af123

Administrator
Staff member
I've been making some small changes to the disk diagnostics page that I'll publish in the next package.
  • Removed the largely pointless Type and Updated columns (they're easily determined by looking at the flags anyway);
  • Added Life Left which is calculated from the other fields - the exact algorithms are vendor specific so this is an approximation based on the observed values. Hovering over a life percentage shows more values as in the screen-shot below. This is disk is more likely to be rated at 100,000 hours rather than 90,000 but it's good enough for an estimate;
  • Background row colour will now change if life is getting short, or if any of the thresholds have been breached;
  • Derive actual temperatures on temperature lines.
upload_2015-8-19_15-19-9.png

I have been unable to get rid of those current sectors on my disk - full smart tests don't find any unreadable sectors. I suspect a firmware bug and at some point I may have to try a security erase.
 
Are the (values) in lines 190 and 194 correct?, I have always seen the 194 line as the useable one and the 190 line as 100 - the 194 value, as shown in the table HERE, this would make line 194 Worst 71 Deg C. as the highest temperature the drive has reached and 190 Worst as 100 - 71 = 29, similarly, line 194 'Value' at 42 = is current hard disk temperature of 42 Deg, C and line 190 'Value ' is 100 - 42 = 58
 
You're right, the calculation (100 -) is only need for 190 and I've done it for both.
 
I was wondering if it would be better to get rid of line 190 all together, that would stop users reporting that something had failed 'in the past' when it hadn't. The message really only indicated that the 'Fan on' temperature has been reached 'in the past' and even this is misleading when the 'Fan' package is installed
 
190 is the more useful one as it includes the temperature threshold set by the disk manufacturer (55 degrees in my case). The reason that the fields are shown as 100-x is that the standard for the normalised SMART data is that lower is worse and a failure is indicated by the value dropping below the threshold.
My disk got that hot when it was first installed. I left it in maintenance mode copying data over from the old disk which was connected via USB. At that time, the fan did not run in maintenance mode unfortunately.

The in_the_past does show that the disk has at some point been hotter than the manufacturer's threshold (it actually just means that worst <= threshold). I'd consider suppressing it for temperature but I'm not yet convinced.
 
I've have been testing a WD Purple drive (WD20PURX-64P6ZY0: see here). Until yesterday the smart table was fully populated. Now only lines 1 and 3 are present in the table: all the info about power on hours, disk temperatures, and so on, have disappeared.
 
Mine's the same. It says:

/mod/webif/html/diag/disk.jim:135: Error: Division by zero
at file "/mod/webif/html/diag/disk.jim", line 135

Presumably $left is 100. Presumably that is because $val is 100. Need to check $thresh as well.
 
Last edited:
Back
Top