Suspected HDD problems on HDR

Shaggy

Member
Having watched several members suffer HDD problems, I fear now it may be my turn. The HDR has appeared to be recording, but one failed yesterday and three today. Both instances of the Formula 1 recording gave only a 5 minute file, with the message 'power failure interrupted recording'.

I've been following BH's "Steps to recover a hard disk ... ", but have a problem running the "Check and Repair Hard Disk" option from maintenance mode. I get the following (sorry - can't remember how to show text as code):

Code:
[ Humax HDR-Fox T2 (humax) 1.02.20/2.19 ]
 
1 - Check and repair hard disk (fix-disk).
2 - Run short hard-disk self test.
3 - Run long hard-disk self test.
4 - Check self-test progress.
epg - Clear persistent EPG data.
x - Leave maintenance mode (Humax will restart).
diag - Run a diagnostic.
cli - System command line (advanced users).
 
Please select option: 1
Any additional options (or press return for none):
Are you sure you wish to run the hard disk checker? [Y/N] y
Running /bin/fix-disk
Custom firmware version 2.19
 
 
Checking disk sda
 
Unmounted /dev/sda1
Unmounted /dev/sda2
Unmounted /dev/sda3
 
Running short disk self test
 
Pending sector error(s) found
 
LBA has not yet been found
A long test is required - this could take 4 hour(s) 15 minutes
Do you wish to continue? [Y/N]: y
Running long disk self test
Unrecognised/invalid output from smartctl
smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, [URL]http://smartmontools.sourceforge.net[/URL]

This is followed by a history of tests run. Has anyone seen problems similar to this, or am I missing something obvious which is preventing the disk repair from running? Thanks in advance for any help offered.

Edit: code tags added
 
Sorry, my post you are referring to is somewhat out of date - things move on. I think you have done things correctly, it's now a case of waiting for the disk gurus to offer further instructions.

Code fields: if you are using a tablet or the like, enter the [code]...[/code] tags by hand (they are also explained under the "help" tab at the top of the page, see "BB Codes"). If not using a mobile browser, the post editor has a tool bar and all you need to do is select the text and click the "{}#" button (insert code field).
 
Please could you post the output from:
Code:
smartctl -l selftest /dev/sda
within [code] [/code] tags.
 
Thanks for getting back to me, I'll have access again this evening and post the results (with the tags this time!)
 
Here is the output of the smartctl command:

Code:
humax# smartctl -l selftest /dev/sda
smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                 _of_first_error
# 1  Short offline       Aborted by host               90%      1981         -
# 2  Short offline       Completed without error       00%      1981         -
# 3  Short offline       Completed without error       00%      1856         -
# 4  Short offline       Aborted by host               50%      1845         -
# 5  Extended offline    Aborted by host               90%      1845         -
# 6  Short offline       Aborted by host               90%      1845         -
# 7  Short offline       Completed without error       00%      1832         -

When I try and run the Hard Disk diagnostic, I get the following error:
Code:
Runtime Error: /mod/webif/lib/settings.class:122: attempt to write a readonly database in procedure '.00000000000000000006>' called at file "disk.jim", line 19 in procedure 'settings _tval_setting' called at file "/mod/lib/jim/oo.tcl", line 46 at file "/mod/lib/jim/oo.tcl", line 62 at file "/mod/webif/lib/settings.class", line 122

But the last time I looked at the HDD Smart data, the only value of note was 3 for Current_Pending_Sector, the other interesting values were zero.
 
Hmm - seems that you have managed to complete an extended offline test (aborted by host usually means that power was removed or it was rebooted before it completed).

You need to run a long test in order to determine which sector on the disk has problems, then it can be fixed.

I'd recommend running a long test from the command line and making sure that the box remains turned on until it finished. Maintenance mode isn't required.

Code:
humax# smartctl --test=long /dev/sda
 
Got back to the box after the 4 hours it estimated and I assume the test had completed, because it had turned itself off. Here's the output from the first command again:

Code:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      1984         546399616
# 2  Short offline       Aborted by host               90%      1981         -
# 3  Short offline       Completed without error       00%      1981         -
# 4  Short offline       Completed without error       00%      1856         -
# 5  Short offline       Aborted by host               50%      1845         -
# 6  Extended offline    Aborted by host               90%      1845         -
# 7  Short offline       Aborted by host               90%      1845         -
 
# 8  Short offline       Completed without error       00%      1832         -

Still not able to run the Hard Disk diagnostic from the webif, getting the same error message as above.
 
Code:
humax# hdparm --repair-sector 546399616 --yes-i-know-what-i-am-doing /dev/sda
and retry the disk test.
You can check periodically whether the test has finished early using the command in post #3. You don't need to wait 4 hours each time.
 
Out of interest, how long can the "smartctl --test=long /dev/sda" test take, bearing in mind that this is testing a 2TB disk? So far, I'm up to four sectors which have caused the test to fail, and which I've subsequently rewritten. The most recent instance of this test has been running for 4.5 hours, still with no result listed. Is there a command to be able to see how far it has progressed?

In the meantime, I have a USB disk ready to copy all the content to, and a new HDD on order from Ebuyer (ST2000VM003). Presumably while the above tests are running, it is best not to start copying content off the HDR or do anything else which may interrupt/slow the test?

Thanks to you all for your help, I really appreciate it.
 
The 'remaining' column in the output from smartctl -l selftest /dev/sda is supposed to give an indication of progress. It can be hit and miss however.
 
Hmm. The four failed tests were all shown as failing at 90%. Either that estimate is somewhat off, or the last 10% is really slow going! Up to 5 hours now and counting ...
 
With any luck that means you have found and fixed all the bad sectors.
There is a way of running the long test on a selective range so it would have been possible to start from the last block you fixed. If you find another bad block I can remind myself how that's done!
 
The last test finally finished after 8 hours, with no more faults found. The HDR still isn't recording (red ring appears and it seems to think it is, but no file is stored) and no packages can be installed, but I've started copying all the recordings to a NAS (fortunately the automount package was already installed).

I was impressed with Ebuyer: new HDD ordered at 10pm yesterday, arrived at 2pm today, and they didn't even charge for next working day delivery! So the next job will be to install it, make sure the sectors are aligned correctly, and copy content back. Hopefully I'll then have a functioning PVR again.

Thanks again for everyone's help, couldn't have got this far without you.
 
Now that the sectors have been repaired, the fix-disk process (option 1 on the maintenance mode menu) should restore full functionality (there is likely some filesystem corruption that needs addressing). Probably worth doing even though you're going to replace the disk.
 
I was impressed with Ebuyer: new HDD ordered at 10pm yesterday, arrived at 2pm today, and they didn't even charge for next working day delivery! So the next job will be to install it, make sure the sectors are aligned correctly, and copy content back. Hopefully I'll then have a functioning PVR again.
You haven't posted your Smart stats., but there's probably nothing wrong with the disk that a fix-disk wouldn't have fixed, so I wouldn't waste my time if I were you.
 
Here it is:
Code:
=== START OF INFORMATION SECTION ===
Device Model:     WDC WD20EURS-63Z9B1
Serial Number:    WD-WCAVY6800263
LU WWN Device Id: 5 0014ee 2b04ef2d5
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Oct 31 19:26:15 2013 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (41100) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
 
                                        SCT Data Table supported.
 
Back
Top