Hard Disk Failure Log

Ezra Pound

Well-Known Member
It is looking like quite a few Humax boxes maybe reaching the 'Autumn of their Years' e.g. Hard disks are beggining to fail, so I thought it maybe time log this. If your hard disk has had to be replaced or looks like it needs to be soon, it would be helpful to log how may hours it has been running and how many start / stop cycles it has had. In Web-If >> Diagnostics >> Hard Disk it is possible to get this info. As my own HDD hasn't failed yet, I hope Brian doesn't mind if give his example e.g.

The Raw figures are,
line 9 : 6023 = Power_On_Hours
line 4 : 4980 = Start_Stop_Count
 
Feel free to use my details as an example, the current figures are

line 9 : 6044 = Power_On_Hours
line 4 : 5000 = Start_Stop_Count

Box purchased in September 2010
Disk is Seagate Pipeline HD 5900.2 - ST3500312CS

Perhaps it would be worth including a copy of the whole Attributes table

ID Name Flags Raw Value Value Worst Thresh Type Updated When Failed
1 Raw_Read_Error_Rate POSR-- 50577172 113 099 006 Pre-fail Always -
3 Spin_Up_Time PO---- 0 097 097 000 Pre-fail Always -
4 Start_Stop_Count -O--CK 5000 096 096 020 Old_age Always -
5 Reallocated_Sector_Ct PO--CK 4273 001 001 036 Pre-fail Always FAILING_NOW
7 Seek_Error_Rate POSR-- 125283699 081 060 030 Pre-fail Always -
9 Power_On_Hours -O--CK 6044 094 094 000 Old_age Always -
10 Spin_Retry_Count PO--C- 0 100 100 097 Pre-fail Always -
12 Power_Cycle_Count -O--CK 2500 098 098 020 Old_age Always -
184 End-to-End_Error -O--CK 0 100 100 099 Old_age Always -
187 Reported_Uncorrect -O--CK 0 100 100 000 Old_age Always -
188 Command_Timeout -O--CK 25770196998 100 099 000 Old_age Always -
189 High_Fly_Writes -O-RCK 3 097 097 000 Old_age Always -
190 Airflow_Temperature_Cel -O---K 40 060 043 045 Old_age Always In_the_past
194 Temperature_Celsius -O---K 40 040 057 000 Old_age Always -
195 Hardware_ECC_Recovered -O-RC- 50577172 047 039 000 Old_age Always -
197 Current_Pending_Sector -O--C- 0 100 100 000 Old_age Always -
198 Offline_Uncorrectable ----C- 0 100 100 000 Old_age Offline -
199 UDMA_CRC_Error_Count -OSRCK 0 200 200 000 Old_age Always -
 
Figures from my "failing(?)" disk.

Line 9 = 4768 - power on hours
Line 4 = 4497 - start stop count

Line 5 = 12629 - reallocated sector count

Box is just approaching 2 years old.
Disk is Seagate pipeline HD 5900.2 - ST31000424CS

All stats:

ID Name Flags Raw Value Value Worst Thresh Type Updated When Failed
1 Raw_Read_Error_Rate POSR-- 174827647 118 099 006 Pre-fail Always -
3 Spin_Up_Time PO---- 0 095 095 000 Pre-fail Always -
4 Start_Stop_Count -O--CK 4497 096 096 020 Old_age Always -
5 Reallocated_Sector_Ct PO--CK 12629 001 001 036 Pre-fail Always FAILING_NOW
7 Seek_Error_Rate POSR-- 81214606 078 060 030 Pre-fail Always -
9 Power_On_Hours -O--CK 4768 095 095 000 Old_age Always -
10 Spin_Retry_Count PO--C- 0 100 100 097 Pre-fail Always -
12 Power_Cycle_Count -O--CK 2249 098 098 020 Old_age Always -
184 End-to-End_Error -O--CK 0 100 100 099 Old_age Always -
187 Reported_Uncorrect -O--CK 2 098 098 000 Old_age Always -
188 Command_Timeout -O--CK 901956894930 100 096 000 Old_age Always -
189 High_Fly_Writes -O-RCK 0 100 100 000 Old_age Always -
190 Airflow_Temperature_Cel -O---K 51 049 039 045 Old_age Always In_the_past
194 Temperature_Celsius -O---K 51 051 061 000 Old_age Always -
195 Hardware_ECC_Recovered -O-RC- 174827647 046 039 000 Old_age Always -
197 Current_Pending_Sector -O--C- 0 100 100 000 Old_age Always -
198 Offline_Uncorrectable ----C- 0 100 100 000 Old_age Offline -
199 UDMA_CRC_Error_Count -OSRCK 0 200 200 000 Old_age Always -
 
Just had a quick look at mine. I do tend to leave it on most of the time hence substantial number of power on hours..

IDNameFlagsRaw ValueValueWorstThreshTypeUpdatedWhen Failed
1 Raw_Read_Error_Rate POSR-- 188697770 118 099 006 Pre-fail Always -
3 Spin_Up_Time PO---- 0 097 097 000 Pre-fail Always -
4 Start_Stop_Count -O--CK 442 100 100 020 Old_age Always -
5 Reallocated_Sector_Ct PO--CK 0 100 100 036 Pre-fail Always -
7 Seek_Error_Rate POSR-- 123495242 081 060 030 Pre-fail Always -
9 Power_On_Hours -O--CK 12302 086 086 000 Old_age Always -
10 Spin_Retry_Count PO--C- 0 100 100 097 Pre-fail Always -
12 Power_Cycle_Count -O--CK 221 100 100 020 Old_age Always -
184 End-to-End_Error -O--CK 0 100 100 099 Old_age Always -
187 Reported_Uncorrect -O--CK 0 100 100 000 Old_age Always -
188 Command_Timeout -O--CK 0 100 100 000 Old_age Always -
189 High_Fly_Writes -O-RCK 5 095 095 000 Old_age Always -
190 Airflow_Temperature_Cel -O---K 54 046 044 045 Old_age Always In_the_past
194 Temperature_Celsius -O---K 54 054 056 000 Old_age Always -
195 Hardware_ECC_Recovered -O-RC- 188697770 046 036 000 Old_age Always -
197 Current_Pending_Sector -O--C- 0 100 100 000 Old_age Always -
198 Offline_Uncorrectable ----C- 0 100 100 000 Old_age Offline -
199 UDMA_CRC_Error_Count -OSRCK 0 200 200 000 Old_age Always -
 
It looks like af123's theory that it maybe the on / off cycles that does the damage rather than the 'spinning time', may be correct. In your case, line 9 shows that at 12302 power-on-hour your HDD has been running for 3 times longer than average, BUT line 4 shows that at only 442 power cycles it has had maybe 10 times less start / stops than average, and the result? well pretty health I would say, with zeros in lines 5, 197, 198, 184, 10 being all good signs
 
It looks like af123's theory that it maybe the on / off cycles that does the damage rather than the 'spinning time', may be correct.
I don't remember saying that...
It's certainly true that most complete disk failures (spin-up failures) occur in servers that normally run continuously following a power-cycle - but that is just because the bearings have worn to the point where the motor can't overcome the friction.
 
Sorry if I mis-quoted you, I do remember someone saying this some time ago, It was also said again recently by prpr HERE
 
Like most things, it's a combination of factors that causes failures. Load/unload cycles aren't usually attributed to wear and tear though.

Here's what Seagate say about the 2TB disk I recently installed:

pipeline-hd-data-sheet-ds1693-6-1206us.pdf - Adobe Reader_2012-11-13_15-39-36.png

So, for this disk, it's far more important to keep the power-on hours down than it is the start/stop cycles.

These drives do run at a higher temperature than the manufacturers would like, although this same 2TB disk is rated for up to 75 centigrade.

It's refreshing to see the manufacturer specifying read errors in terms of sectors too, too often I see read error probability calculated as bytes read / disk BER when the event is actually a sector failing to read due to the CRC.
The BER of the raw disk is much higher and damped by the sector CRC function.
 
I can't wait for Solid State Drives to get cheaper!! (or would these not have enough oomph to handle the workload?)
 
The oomph would be fine... the write cycle endurance wouldn't be though!

SSDs are great for data which is mainly constant (eg operating systems and software). I admit once recordings are made the data is constant, and there is wear levelling for data that gets updated often, but a nearly full drive would be mostly constant and what little there was left would be subject to major write cycling in the TSR buffer.
 
The oomph would be fine... the write cycle endurance wouldn't be though!

SSDs are great for data which is mainly constant (eg operating systems and software). I admit once recordings are made the data is constant, and there is wear levelling for data that gets updated often, but a nearly full drive would be mostly constant and what little there was left would be subject to major write cycling in the TSR buffer.

But isn't the point that SSD's are just memory aren't they? And with no moving parts to spin up/down and start/stop etc what would it matter how much they got used? Sorry I'm not up on stuff inside the boxes, I just look at them from outside the casing!!!
 
It would depend what solid-state memory technology was used, If you think back to the old days of PROM, RAM and Flash memory devices they all had different read /write speeds, but they also had a maximum number of read / write cycles before they die, which is why a USB flash devices for example would be no use in place of a USB HDD for recordings on the Humax HD-Fox T2
 
Cool - I was getting so many dodgy playbacks and it is of that age I thought it was having some terrible twos. Have removed sysmon so will keep an eye on it now. Thanks
 
But isn't the point that SSD's are just memory aren't they? And with no moving parts to spin up/down and start/stop etc what would it matter how much they got used? Sorry I'm not up on stuff inside the boxes, I just look at them from outside the casing!!!
Memory - yes. Just memory - no.

Normal volatile Static RAM (the sort that forgets if you turn the power off) is just a logic circuit that can be a 0 or a 1 (what we call a bistable) - but lots of them of course. Very fast, but need juice. Dynamic RAM is a variation where the 0s and 1s are levels of charge stored on a minute capacitor. The storage element is much smaller than for static RAM so huge memory capacities are possible (these are used for the gigabytes of memory in a PC), but the charge leaks away from the capacitor over a short period of time so there has to be a frequent refresh cycle to keep restoring it.

SSDs and USB sticks use non-volatile technology. They work a bit like dynamic RAM in that the data is stored as charges, but the charge does not leak away because it is injected into a highly insulating material and then read by sensing the electric field it creates. The problem is, how do you inject charge into an insulating material (and clear it out again to erase it)? The write mechanism is a bit like an electron gun in a CRT, except in the CRT the electrons only have to cross a vacuum.

The electric fields required, and the ablating effect of the passage of high-energy electrons, are a wear mechanism and the material can only put up with so much of it before it loses its properties and goes leaky. Hence there is a limit on the write cycles, typically 100,000. SSDs have built-in mechanisms to redirect writes to different sections each time, so that the wear is shared out ("wear levelling"), and this can help a lot, but they also impose a performance hit as the SSD fills up.

In summary then: blisteringly fast for reads, relatively slow for sustained writes (OK in bursts because of caching), and dubious lifetime in continuous write situations (like a PVR).
 
There is also the problem of write amplification in that as an SSD fills up, write operations tend to become a sequence of read-erase-modify-write (which cripples write performance) unless the operating system supports something called TRIM which is a command understood by SSDs that causes them to prepare areas of flash for overwrite, but this has its own set of problems.
 
I've just uploaded a new version of sysmon that keeps a record of some of the key attributes from your hard disk in the database. They aren't graphed but can be extracted to provide a historical view of things like the reallocated sector count.
 
Just received my replacement Fox T2 HDR 1TB.
Thought it might be worth posting stats. May be useful ref.
Disk Information
SMART Status PASSED
Model Family Seagate Pipeline HD 5900.2
Device Model ST31000424CS
Serial Number 5VX2XQQP
LU WWN Device Id 5 000c50 049045f87
Firmware Version SC13
User Capacity 1,000,204,886,016 bytes [1.00 TB]
Sector Size 512 bytes logical/physical
Device is In smartctl database [for details use: -P show]
ATA Version is 8
ATA Standard is ATA-8-ACS revision 4
Local Time is Fri Nov 16 13:28:54 2012 GMT
SMART support is Available - device has SMART capability.
SMART support is Enabled
Attributes
ID Name Flags Raw Value Value Worst Thresh Type Updated When Failed
1 Raw_Read_Error_Rate POSR-- 73766344 114 100 006 Pre-fail Always -
3 Spin_Up_Time PO---- 0 095 095 000 Pre-fail Always -
4 Start_Stop_Count -O--CK 126 100 100 020 Old_age Always -
5 Reallocated_Sector_Ct PO--CK 0 100 100 036 Pre-fail Always -
7 Seek_Error_Rate POSR-- 775389 100 253 030 Pre-fail Always -
9 Power_On_Hours -O--CK 53 100 100 000 Old_age Always -
10 Spin_Retry_Count PO--C- 0 100 100 097 Pre-fail Always -
12 Power_Cycle_Count -O--CK 63 100 100 020 Old_age Always -
184 End-to-End_Error -O--CK 0 100 100 099 Old_age Always -
187 Reported_Uncorrect -O--CK 0 100 100 000 Old_age Always -
188 Command_Timeout -O--CK 0 100 100 000 Old_age Always -
189 High_Fly_Writes -O-RCK 0 100 100 000 Old_age Always -
190 Airflow_Temperature_Cel -O---K 39 061 045 045 Old_age Always In_the_past
194 Temperature_Celsius -O---K 39 039 055 000 Old_age Always -
195 Hardware_ECC_Recovered -O-RC- 73766344 047 044 000 Old_age Always -
197 Current_Pending_Sector -O--C- 0 100 100 000 Old_age Always -
198 Offline_Uncorrectable ----C- 0 100 100 000 Old_age Offline -
199 UDMA_CRC_Error_Count -OSRCK 0 200 200 000 Old_age Always -
Self-test logs
No. Description Status Remaining When First Error LBA
# 1 Short offline Completed without error 00% 52 -
# 2 Short offline Completed without error 00% 52 -
Rendered in: 0.518 seconds


Sent from my iPad using Tapatalk
 
Back
Top