Hard Disk Failure?

When accessing my HDR using the webif I get this warning message.

!! WARNING !!

There appear to be some hardware problems with the internal hard disk on this device.

Disk overall health assessment is: FAILED!

Under diagnostics for HD it shows this :-

SMART data read from device /dev/sda
Disk Information
SMART Status FAILED! Model Family Seagate Pipeline HD 5900.2 Device Model ST31000424CS Serial Number 9VX17VJJ LU WWN Device Id 5 000c50 02d77df49 Firmware Version SC13 User Capacity 1,000,204,886,016 bytes [1.00 TB] Sector Size 512 bytes logical/physical Device is In smartctl database [for details use: -P show] ATA Version is 8 ATA Standard is ATA-8-ACS revision 4 Local Time is Mon May 27 13:48:35 2013 BST SMART support is Available - device has SMART capability. SMART support is Enabled

Attributes
IDNameFlagsRaw ValueValueWorstThreshTypeUpdatedWhen Failed
1 Raw_Read_Error_Rate POSR-- 31731537 111 099 006 Pre-fail Always -
3 Spin_Up_Time PO---- 0 096 096 000 Pre-fail Always -
4 Start_Stop_Count -O--CK 3112 097 097 020 Old_age Always -
5 Reallocated_Sector_Ct PO--CK 5229 001 001 036 Pre-fail Always FAILING_NOW
7 Seek_Error_Rate POSR-- 78314390 078 060 030 Pre-fail Always -
9 Power_On_Hours -O--CK 3302 097 097 000 Old_age Always -
10 Spin_Retry_Count PO--C- 0 100 100 097 Pre-fail Always -
12 Power_Cycle_Count -O--CK 1561 099 099 020 Old_age Always -
184 End-to-End_Error -O--CK 0 100 100 099 Old_age Always -
187 Reported_Uncorrect -O--CK 0 100 100 000 Old_age Always -
188 Command_Timeout -O--CK 541174136958 100 098 000 Old_age Always -
189 High_Fly_Writes -O-RCK 0 100 100 000 Old_age Always -
190 Airflow_Temperature_Cel -O---K 55 045 044 045 Old_age Always FAILING_NOW
194 Temperature_Celsius -O---K 55 055 056 000 Old_age Always -
195 Hardware_ECC_Recovered -O-RC- 31731537 047 038 000 Old_age Always -
197 Current_Pending_Sector -O--C- 0 100 100 000 Old_age Always -
198 Offline_Uncorrectable ----C- 0 100 100 000 Old_age Offline -
199 UDMA_CRC_Error_Count -OSRCK 0 200 200 000 Old_age Always -

Self-test logs
No.DescriptionStatusRemainingWhenFirst Error LBA
# 1 Extended offline Completed: unknown failure90%32980 # 2 Extended offline Completed: unknown failure90%32980 # 3 Short offline Completed: unknown failure90%32980 # 4 Short offline Completed: unknown failure90%32980

I've run check disk over telnet and get this error message - see file upload.

I've had no problems with my Humax whatsoever and wonder if this is some sort of false positive OR should I be worried.

Any help / opinion welcome.

Alan
 

Attachments

  • disk error.jpg
    disk error.jpg
    58.1 KB · Views: 18

MartinLiddle

Super Moderator
Staff member
I would say that the reallocated sector count of 5229 is high and the disk probably is on the way out. I suggest you wait until af123 or xyz321 who are the real experts on this can comment.
 

Ezra Pound

Well-Known Member
I would re-check the Web-If >> Diagnostics >> Hard Disk now that you have carried out a Fix-Disk run, hopefully you should now get a 'SMART Status PASSED'. Some 'Reallocated_Sector' is normal although, as stated 5229 is high, as long as this figure doesn't increase significantly, I would say your Hard Disk is O.K. for now BUT if it does start drifting up, I would prepare to back-up recordings you want to keep and plan for a hard disk replacement
BTW
lines 197 and 198 (currently at Zero on your hard disk) are also indicators of trouble if they move away from zero, you can safely ignore line 190 as it is not supported on the Humax
 

Black Hole

May contain traces of nut
According to the screen grab (a text dump would have been better):

Error at LBA 0
Dev: /dev/sda LBA: 0
Bad partition offset -1
 

xyz321

Well-Known Member
According to the screen grab (a text dump would have been better):

Error at LBA 0
Dev: /dev/sda LBA: 0
Bad partition offset -1
This is the first time we have seen this particular fault. It seems that fix-disk is not parsing the output correctly. I can not see any obvious reasons why it should fail. In order to diagnose, it would be helpful if the OP could run the following command from the telnet prompt and post the output.
Code:
smartctl -l selftest /dev/sda | hexdump -C

As far as the disk goes, it is probably time to look for a replacement.
 
OP
S

Suffolkboy

Member
Thanks for all your input so far - sorry don't know how to do a text dump from the cmd prompt. Have run the command above and the results are in the uploaded screen dump. I'm not really bothered about loosing anything - would a reformat do the trick?

Thanks again - Alan
 

Attachments

  • disk error 2.jpg
    disk error 2.jpg
    301.6 KB · Views: 14

Ezra Pound

Well-Known Member
A reformat should mark bad areas of a disk so that they can be avoided (if possible), however the 'bad areas' have a tendency to get bigger, hence the growing of #5, #197 and #198, if this happens re-formatting won't help
 

Black Hole

May contain traces of nut
Thanks for all your input so far - sorry don't know how to do a text dump from the cmd prompt. Have run the command above and the results are in the uploaded screen dump. I'm not really bothered about loosing anything - would a reformat do the trick?
You should be able to copy and paste from the Telnet window. What we usually do is paste it into [code]...[/code] tags (available on the edit toolbar as the curly brackets icon).

You could try a reformat from the Humax menus - it might buy some time, but the reallocated sectors are worrying.
 

af123

Administrator
Staff member
hopefully you should now get a 'SMART Status PASSED'.

In this case, the drive firmware has decided that the disk is broken - hence the overall SMART status of FAILED - almost certainly as a result of the high reallocation count but we can't tell as disk vendors don't publish full details of their logic and thresholds. Although you can correct sectors etc, you won't reverse the drive's decision and it is time for the OP to backup everything possible (if wanted) and replace the disk.
 
OP
S

Suffolkboy

Member
" hopefully you should now get a 'SMART Status PASSED'." - I don't!!!

Ok I think I know where this is going. Interesting thing is I have no problems in day to day use whatsoever. I'll just ignore it and if I start getting problems I'll reformat and see what happens. I've only got 115Gb of used space so , I'm thinking, plenty of good sectors left.

Thanks for all your comments and time.

By the way whats an OP? I am an OAP!!!

Alan

 

Brian

Administrator
Staff member
OP is Original Poster.

Out of interest, have you tried the native HDD Test.

MENU > Settings > System > Data Storage > HDD Test

It will probably also indicate a Fail.
 
OP
S

Suffolkboy

Member
Thanks everyone.

Performed native HDD Test - failed instantly without appearing to do anything.

Message reads " HDD Test Fail. You may recover the HDD through Format Storage ( Error Code 5 )"

Alan
 

Black Hole

May contain traces of nut
You say it is still working - you can carry on as you are, but be aware it might stop working any minute.

If you have nothing on the disk you are particularly concerned about, you could try a format through the Humax menus, but there is a risk the format operation will fail and leave you either as you are or with a non-functioning PVR.

The opinion of our two acknowledged experts is that the disk is beyond recovery. Consider yourself on borrowed time and prepare to replace the hard drive - the physical process is not difficult, just a case of being methodical and careful. The difficult part is tracking down an appropriate drive in the first place: see "Which Drive?" in the Wiki section HERE (click). The installation will be straight-forward if you stick to a drive 1TB or less (using the Humax menus to initialise it).
 

Brian

Administrator
Staff member
Thanks everyone.

Performed native HDD Test - failed instantly without appearing to do anything.

Message reads " HDD Test Fail. You may recover the HDD through Format Storage ( Error Code 5 )"

Alan
I have had a similar problem in the past that I reported HERE, I ended up replacing the HDD.
 

af123

Administrator
Staff member
Consider yourself on borrowed time and prepare to replace the hard drive
That's a good way to sum it up. I have seen drives in this state carry on for a long time but there will already be a performance impact from the reallocated sectors and it could completely fail at any moment.

Performed native HDD Test - failed instantly without appearing to do anything.
Message reads " HDD Test Fail. You may recover the HDD through Format Storage ( Error Code 5 )"

That's what I would expect in this case. It's just seeing the same overall disk health assessment and reporting it as faulty.
 

HarveyB

Active Member
I had similar problem last year. It got to the stage that all of my recordings kept losing bits (3 or 4 seconds at a time) on playback.

Ended up getting the box exchanged (it was 23 months old) so still under warranty.

I suggest you run disk check regularly, if the reallocated sector count continues to rise then replace it.


Sent from my iPad using Tapatalk HD
 
OP
S

Suffolkboy

Member
Hello all - have been doing some digging on the netty.

Am I right in thinking that once the SMART status has been set to FAIL by the HD itself there's now't you can do about it other than change the drive. Reformatting won't clear the failed tag. Does line 190 above (190 Airflow_Temperature_Cel -O---K 55 045 044 045 Old_age Always FAILING_NOW) indicate that the HD is / was too hot? Could this be why the status is set to fail? Of all the threads ,on other forums, I've read with people asking how to reset the SMART status the concensus seems to be that even the Seagate Utilities won't do that (it's to stop people making an old HD look new - like clocking a car). I bet most of us don't realise how much information a HD stores about itself - disk speed, run up and down times, hours of use, temperature to name but a few.
Interestingly this problem only manifested itelf when I upgraded the custom firmware to 2.16 from 2.11. I've tried going back to Humax software v1.2.28 and native HD test still fails. Tried custom firmware 2.11 - still fails.

Is it possible that the smartmon in v2.16 has set the SMART status to failed and that can't be undone??

If the HD was in a PC it might be possible to disable SMART in the BIOS - is this possible on the hummy?

Anyway, while it's working I won't loose too much sleep over it but I do like to try to understand these things until I get to the point my brain overheats and sets it's status to failed!!!!

Alan
 

af123

Administrator
Staff member
Am I right in thinking that once the SMART status has been set to FAIL by the HD itself there's now't you can do about it other than change the drive. Reformatting won't clear the failed tag.
Correct - the drive itself has decided it has failed. Almost definitely because of the high number of sector reallocations.
Does line 190 above (190 Airflow_Temperature_Cel -O---K 55 045 044 045 Old_age Always FAILING_NOW) indicate that the HD is / was too hot? Could this be why the status is set to fail?
Yes, too hot for the configured threshold set by the drive manufacturer anyway - normal for these PVRs. No, it wouldn't be the reason for the failed status.
Is it possible that the smartmon in v2.16 has set the SMART status to failed and that can't be undone??
No, the drive has determined the failure status for itself.
If the HD was in a PC it might be possible to disable SMART in the BIOS - is this possible on the hummy?
All that would do is stop the BIOS reporting that the drive has failed during initialisation..
 

Ezra Pound

Well-Known Member
Does line 190 above (190 Airflow_Temperature_Cel -O---K 55 045 044 045 Old_age Always FAILING_NOW) indicate that the HD is / was too hot?
The details in line 190 are not displayed correctly on the Humax (See note in #3). The details in line 194 can be used and in your case show the HDD has never got hotter than 56 Deg C in it's life, This is quite normal "194 Temperature_Celsius -O---K 55 055 056 000 Old_age Always -"
 
Top