Hard Disk Failure

Lefteris · Nov 1, 2015

Hi - Yesterday, my hard disk failed 1 year and 6 days after purchase ! Just out of warranty so I checked the Maintenance Option on this great Forum site, and have been running Opt 1 Fix Disk for a couple of hours now. Not sure if I should be worried yet since it clearly says in your instructions that it might take several hours. It does seem to be looping however, struggling with Block Inode 0 as per the enclosed screen shot - been round this lot several times. Is that repetition normal or should it have moved on - ie the disk is knackered ! Or can I do anything else to try and recover it ? Many thanks

It goes on to say -
testb: Invalid block number 0
Block 0 is marked as in use

Searching for inode...
debugfs 1.42.10 (18-May-2014)

prpr · Nov 1, 2015

You don't know the disk is knackered. There might be absolutely nothing wrong with it and it's just the filesystem that's corrupt.
What do your SMART statistics show? Use "smartctl -a /dev/sda" at the maintenance menu's command line option.
If there's stuff on there you want to keep, then you should be thinking about getting it off there and on to another disk pending a re-format.

MontysEvilTwin · Nov 1, 2015

Even if your disk is fubar (which it probably isn't) I would check on the manufacturer's website as you may have a two year warranty. You have an advanced format disk and the current fix-disk program does not reallocate unreadable sectors effectively. As prpr suggested, take a look at your smart statistics and post them here (paste within code tags, or use code the option in the forum format menu).

Lefteris · Nov 1, 2015

Hi - the fix disk did not cure the problem so I did the smart thing you suggested - here's the result

-----------------------------------------------------------------------
Now starting a system command prompt. You can make this the default by
enabling 'Expert mode telnet server' on the web interface settings page.
-----------------------------------------------------------------------

Humax HDR-Fox T2 (humax) 1.03.12/3.00

To return to the menu, type: exit

humax# smartctl-a/dev/sda
/bin/sh: smartctl-a/dev/sda: not found
humax# smartctl-a/dev/sda
/bin/sh: smartctl-a/dev/sda: not found

Have I done something wrong or is it curtains for my disk ?

Cheers

prpr · Nov 1, 2015

Try not taking the spaces out that I put in.

Lefteris · Nov 1, 2015

Sorry - not too clear in this font - done it again and got this

-----------------------------------------------------------------------
Now starting a system command prompt. You can make this the default by
enabling 'Expert mode telnet server' on the web interface settings page.
-----------------------------------------------------------------------

Humax HDR-Fox T2 (humax) 1.03.12/3.00

To return to the menu, type: exit

humax# smartctl -a /dev/sda
smartctl 6.0 2013-04-25 r11898M [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-12, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: ST1000VM002-1CT162
Serial Number: S1G2G70M
LU WWN Device Id: 5 000c50 061f17769
Firmware Version: SC23
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sun Nov 1 17:53:04 2015 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 107) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 135) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x10b9) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 085 006 Pre-fail Always - 81473408
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 099 099 020 Old_age Always - 1616
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 8
7 Seek_Error_Rate 0x000f 076 060 030 Pre-fail Always - 8678623205
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5329
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age Always - 1616
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 2161
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 053 053 000 Old_age Always - 47
190 Airflow_Temperature_Cel 0x0022 069 043 045 Old_age Always In_the_past 31 (0 2 33 31 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1614
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1616
194 Temperature_Celsius 0x0022 031 057 000 Old_age Always - 31 (0 16 0 0 0)
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 7
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 7
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 3347 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3347 occurred at disk power-on lifetime: 5327 hours (221 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 c1 1c 20 00 Error: UNC at LBA = 0x00201cc1 = 2104513

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 c0 1c 20 e0 00 00:03:59.097 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:59.096 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:58.727 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:58.685 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:58.295 READ DMA

Error 3346 occurred at disk power-on lifetime: 5327 hours (221 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 c1 1c 20 00 Error: UNC at LBA = 0x00201cc1 = 2104513

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 c0 1c 20 e0 00 00:03:58.727 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:58.685 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:58.295 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:58.294 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:57.924 READ DMA

Error 3345 occurred at disk power-on lifetime: 5327 hours (221 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 c1 1c 20 00 Error: UNC at LBA = 0x00201cc1 = 2104513

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 c0 1c 20 e0 00 00:03:58.295 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:58.294 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:57.924 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:57.882 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:57.493 READ DMA

Error 3344 occurred at disk power-on lifetime: 5327 hours (221 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 c1 1c 20 00 Error: UNC at LBA = 0x00201cc1 = 2104513

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 c0 1c 20 e0 00 00:03:57.924 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:57.882 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:57.493 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:57.492 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:57.123 READ DMA

Error 3343 occurred at disk power-on lifetime: 5327 hours (221 days + 23 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 c1 1c 20 00 Error: UNC at LBA = 0x00201cc1 = 2104513

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 c0 1c 20 e0 00 00:03:57.493 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:57.492 IDENTIFY DEVICE
c8 00 08 c0 1c 20 e0 00 00:03:57.123 READ DMA
ec 00 00 c1 1c 20 a0 00 00:03:57.080 IDENTIFY DEVICE
c8 00 20 c0 1c 20 e0 00 00:03:56.691 READ DMA

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 5327 2104512
# 2 Short offline Completed: read failure 90% 5327 2104512
# 3 Short offline Completed: read failure 90% 5327 2104512
# 4 Short offline Completed: read failure 90% 5327 2104512
# 5 Short offline Completed: read failure 90% 5327 2104512
# 6 Short offline Completed: read failure 90% 5327 2104512
# 7 Short offline Completed: read failure 90% 5327 2104512
# 8 Short offline Completed: read failure 90% 5327 2104512
# 9 Short offline Completed: read failure 90% 5327 2104512
#10 Short offline Completed: read failure 90% 5327 2104512
#11 Short offline Completed: read failure 90% 5327 2104512
#12 Short offline Completed: read failure 90% 5323 2104512
#13 Short offline Completed: read failure 90% 5323 2104512
#14 Short offline Completed: read failure 90% 5323 2104512
#15 Short offline Completed: read failure 90% 5323 2104512
#16 Short offline Completed: read failure 90% 5323 2104512
#17 Short offline Completed: read failure 90% 5323 2104512
#18 Short offline Completed: read failure 90% 5323 2104512
#19 Short offline Completed: read failure 90% 5323 2104512
#20 Short offline Completed: read failure 90% 5323 2104512
#21 Short offline Completed: read failure 90% 5323 2104512

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Hope it makes sense to you - I'm afraid I'm out of my depth with this !

Cheers

prpr · Nov 1, 2015

Whilst the bad sector count is not too bad at this point, it has other errors that would be worrying me.
It might be fixable and carry on OK, but I'd be looking at getting a warranty replacement.
Have you checked on Seagate's web site?
I wouldn't assume it's only 1 year until you've checked here...
Actually, I just tried it (seeing as your model/serial number info. is in the above) and it says it's an OEM component, so you have to refer to whoever you bought it from.

prpr · Nov 1, 2015

If you want to try resurrecting it, you could start with this:

Code:

humax# hdparm --repair-sector 2104512 --yes-i-know-what-i-am-doing /dev/sda
humax# hdparm --repair-sector 2104513 --yes-i-know-what-i-am-doing /dev/sda
humax# badblocks -sv /dev/sda2

You might need to repeat the sector repairs in sequence up to 2104519 before doing the "badblocks" command.

MontysEvilTwin · Nov 1, 2015

If the disk is in warranty you could get it replaced, but you should be able to get it up and running again. Take a look at this thread here. You have a sector/ sectors which needs/ need to be reallocated, but because you have an AF disk, 8 logical sectors (1 physical sector) have to be rewritten and fix-disk can't handle this yet. The sectors can be reallocated from the command line in telnet.
First you might have to install the hdparm package (I'm not sure if this is already installed by default):

Code:

opkg install hdparm

The next steps are best done in maintenance mode. The first problem sector is at LBA 2104512. I would try reading this first:

Code:

hdparm --read-sector 2104512 /dev/sda

I think that fix-disk will have corrected this one so the read command will return lots of zeros. If this is the case, read the next sector:

Code:

hdparm --read-sector 2104513 /dev/sda

If this returns an error message you need to write to this sector to force reallocation:

Code:

hdparm --write-sector 2104513 --yes-i-know-what-i-am-doing /dev/sda

Then read the next sector and write to it (as above) if it is unreadable. You will probably have to rewrite 7 sectors in total (assuming that fix-disk did the first one). When reading the sectors starts returning data rather than error messages you should stop rewriting them. I would then run fix-disk again and see if it goes to completion. If so, leave maintenance mode, check your smart statistics to see that the sectors have reallocated, and check that the unit is working properly again.

Lefteris · Nov 1, 2015

OK Thanks both - I've done the re-write on sectors 2104512 to 4519 as suggested and am now doing the check for bad blocks - it will take a while !
I'll post the results

prpr · Nov 1, 2015

MontysEvilTwin said:
First you might have to install the hdparm package (I'm not sure if this is already installed by default)

If your hard disk is broken, how and where would it install to?
Obviously the tools for fixing the disk can't be on the disk itself. It's in the flash memory (/sbin/hdparm).

MontysEvilTwin · Nov 1, 2015

prpr said:
If your hard disk is broken, how and where would it install to?
Obviously the tools for fixing the disk can't be on the disk itself. It's in the flash memory (/sbin/hdparm).

I don't understand your post. I thought that the hdparm package was installed in flash, I just was not sure that it was a preinstalled default package. There are a number of packages that are stored in flash which aren't installed by default when the custom firmware is loaded.

Lefteris · Nov 1, 2015

OK the bad blocks check is finished with zero bad blocks found, so I rebooted into normal mode and the Humax HDD Test passed OK but it's still showing zero gb as available used reserved and total, so is there anything else I can do in Maintenance or maybe tomorrow I'll try accessing the disk through File Manager in Windows to recover the files on it. Any advice greatly appreciated = Thanks

prpr · Nov 2, 2015

MontysEvilTwin said:
I don't understand your post. I thought that the hdparm package was installed in flash, I just was not sure that it was a preinstalled default package. There are a number of packages that are stored in flash which aren't installed by default when the custom firmware is loaded.

There is a version of hdparm built into the CF (the bit you install from USB). This is what runs in maintenance mode, for reasons explained previously. There is also an installable package, which resides in /mod/sbin/hdparm when installed. This is a 'useless' package as the file is already built in to the CF. I can only assume that at some point in history it wasn't built in to the CF and that this is the reason why it's still in the package repository.

prpr · Nov 2, 2015

Lefteris said:
OK the bad blocks check is finished with zero bad blocks found, so I rebooted into normal mode and the Humax HDD Test passed OK but it's still showing zero gb as available used reserved and total, so is there anything else I can do in Maintenance or maybe tomorrow I'll try accessing the disk through File Manager in Windows to recover the files on it.

Did you run fix-disk again? You've fixed the disk surface (for now), so now you need to fix the filesystem.

Lefteris · Nov 2, 2015

OK - I thought after my last reply that you'd probably say that, so I will run fix disk now - I'll let you know the result

That was quick - about 20 mins doing lots of Cleared and Fixed - rebooted into normal mode and it now sees the disk data - only it's reporting

885.8 gb Available
20.2 gb Used
114.0 gb Reserved
1000.0 gb Total size

So I checked the Browse Media List and the files all appear to be there -

reporting -82849247232 bytes (-9%) as Used
903.54 gb as Free (101%)
79.68 gb in Dustbin (8%)
on a Total Space of 906.06 gb

I am now copying the files off to a spare disk, and presumably will have to re-format or what ?

Cheers

Lefteris · Nov 2, 2015

By the way - for copying off onto an external hard disk, I use Cute FTP 8.3 - I've found it very efficient over the years !

Black Hole · Nov 2, 2015

What do you mean by "external hard disk"? I take that to imply a drive connected to the Humax by USB. I don't see how copying to that using FTP can be efficient at all, because unless you have somehow invoked FXP (file exchange protocol, which I am not sure the server is capable of), the data would need two passes on your network followed by one on USB.

Lefteris · Nov 2, 2015

Hi Black Hole - no, I mean a portable hard disk connected to my desktop computer USB port, which is running Cute FTP and copying direct from the Humax to the H/Disk over my ethernet network

prpr · Nov 2, 2015

Lefteris said:
reporting -82849247232 bytes (-9%) as Used
903.54 gb as Free (101%)
79.68 gb in Dustbin (8%)
on a Total Space of 906.06 gb

They are some bizarre numbers, which probably indicate your filesystem is not totally happy.

I am now copying the files off to a spare disk, and presumably will have to re-format or what ?

I probably would. You'll have to reinstall the Webif and all your packages afterwards.
Need to keep an eye on those disk SMART statistics as well...

Hard Disk Failure

Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Member

Member

May contain traces of nut

Member

Well-Known Member