Hard drive failure?

rob4x4

Member
Hello all.
I've been getting glitches in my recordings and I suspect the hard drive might be faulty.
The Humax HDD test shows no errors. I tried running the fix-disk command via telnet but I get the message:
humax# fix-disk
Checking disk sda

Unmounted /dev/sda1
umount: can't umount /mnt/hd2: Device or resource busy

Any suggestions?
I'm running Custom firmware version: 2.12 Humax Version: 1.02.29.

Is there another way to check the hard drive without removing it from the case?
Thanks in advance.
 
No worries. I tried fix-disk again about an hour later and it started running (and is still running). Maybe I was too eager to run is after the reboot into maintenance mode. I await the results!
 
Sounds reasonable - you need to wait until it's finished mounting the disks etc. before it's really in maintenance mode. The next custom firmware should improve things in this area.
 
Hmm fix-disk threw up a load of stuff but nothing that said corrupted. Lots of invalid things.
I tried the humax HDD Test again on the system menu but now it just freezes halfway through.
I thought I would try diskattr diagnostic but when I try to run it I get...
>>> Beginning diagnostic diskattr
Running: diskattr
No internal disk found.
>>> Ending diagnostic diskattr

How do I get diskattr to run?
 
I've been getting glitches in my recordings and I suspect the hard drive might be faulty.

What sort of glitches in what sort of recordings? HiDef in particular or StDef as well? How long have you had the Humax and how full is the disk?
 
Its mainly on HD recordings (I don't really record much SD). The program stutters and skips at times and I get a corrupt bar across the screen. It looks different from the usual bad signal corruption that I sometimes get.
500GB Humax is 21 months old (got 24 month warranty from John Lewis). Disk was 3/4 full so I deleted a load of stuff then copied off the remainder. Formatted the hard drive then copied the stuff back on. Now the disk is about 1/4 full.

I ran diskattr agan and only got this...
>>> Beginning diagnostic diskattr
Running: diskattr

>>> Ending diagnostic diskattr

I also ran Fix-disk again and got this result...

humax# fix-disk
Checking disk sda

Unmounted /dev/sda1
Unmounted /dev/sda2
Unmounted /dev/sda3

Checking partition /dev/sda1...
e2fsck 1.41.14 (22-Dec-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
hmx_int_stor: 17/65808 files (5.9% non-contiguous), 14956/263062 blocks

Checking partition /dev/sda3...
e2fsck 1.41.14 (22-Dec-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
hmx_int_stor: 11/655776 files (0.0% non-contiguous), 79731/2622611 blocks

Creating swap file...
Setting up swapspace version 1, size = 1073737728 bytes

Checking partition /dev/sda2...
e2fsck 1.41.14 (22-Dec-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
hmx_int_stor: 3176/29860704 files (3.1% non-contiguous), 32112680/119209984 blocks
Are you having problems with a delete loop [Y/N]? n

Finished - type 'reboot' to return to normal operation
Nothing invalid this time. I haven't had a chance to test any new recordings yet. Does that all look ok?
 
The diskattr diagnostic requires the smartmontools package to be loaded (although it doesn't say that). Do you have that installed?

I need to complete the work on adding disk inspection to the web interface!
 
The diskattr diagnostic requires the smartmontools package to be loaded (although it doesn't say that). Do you have that installed?

I need to complete the work on adding disk inspection to the web interface!

Ah I didn't know that. I will load the package and try again, although it will be tomorrow now.
Thanks!
 
It is interesting that the diskattr diagnostic will not run when fix-disk runs fine. Diskattr uses a simplified version of the same test used by fix-disk so should not fail. The difference is that fix-disk runs entirely from flash. I think your version of busybox may be corrupt, try:
Code:
sha1sum /mod/bin/busybox/busybox
or
Code:
md5sum /mod/bin/busybox/busybox
They should return '96d504ca2b522d192b6caad1cc1b1f327f777a5b' and 'b588c1d0957f629a91f0867c4f0a95d2' respectively. If busybox is corrupt then these commands will either return incorrect hashes or not run at all.
 
It's not getting any better. Grand Designs recorded on 4HD last night was very glitchey, skipping and jumping about.

@xyz321
I ran the checksums and they both came back with the numbers you quoted so I guess busybox is not corrupt.

@af123
I installed smartmontools and then ran diskattr and got this... I dont really understand what its telling me. Is it good or bad? I've uploaded the diskattr output as an attachment too as it kept the columns in the right place so its easier to read.

>>> Beginning diagnostic diskattr
Running: diskattr
smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Seagate Pipeline HD 5900.2
Device Model: ST3500312CS
Serial Number: 6VV3BJZS
LU WWN Device Id: 5 000c50 027e2fffe
Firmware Version: SC13
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Thu Oct 25 09:15:18 2012 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 64376395
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4348
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 51020073
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1851
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2174
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 080 080 000 Old_age Always - 20
190 Airflow_Temperature_Cel 0x0022 076 045 045 Old_age Always In_the_past 24 (Min/Max 24/24)
194 Temperature_Celsius 0x0022 024 055 000 Old_age Always - 24 (0 10 0 0)
195 Hardware_ECC_Recovered 0x001a 040 031 000 Old_age Always - 64376395
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 5
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 5
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0


>>> Ending diagnostic diskattr
 

Attachments

  • diskattr_table.txt.txt
    2.7 KB · Views: 9
Yes, that isn't good. In particular the fact that you have current pending sectors and offline uncorrectable ones explains the glitches you are seeing. These are parts of the disk (sectors) that the disk firmware isn't sure about. Next time an attempt is made to write to one of those sectors that will be resolved - either by the disk firmware deciding it's really fine or by it reallocating the sector to one of the spare ones it keeps in reserve for this. This has already happened once before as you can see by the Reallocated_Sector_Ct field.

The first thing to do is to look at the disk self test log and kick off a short selftest manually if one hasn't been run recently:

to view the log:

Code:
humax# smartctl -l selftest /dev/sda

To run a short selftest (which is what the Humax on-screen disk test does):

Code:
humax# smartctl --test=short /dev/sda
(assuming your hard disk is currently /dev/sda which it will be unless you have any USB drives connected)

Then view the log again.
 
I think your version of busybox may be corrupt, try:
In my experience, and I can't entirely explain it, if these Seagate disks have any 'pending sectors' then they will occasionally return bad data for any file read from it. It may be something related to the different error detection/correction algorithm used on the AV drives.
 
I ran the disk test. First log said this....
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1847 -
# 2 Short offline Completed without error 00% 1847 -
# 3 Short offline Completed without error 00% 1834 -
# 4 Short offline Completed without error 00% 1820 -
# 5 Short offline Completed without error 00% 1534 -

Then after I ran the test the second log said this....

humax# smartctl -l selftest /dev/sda
smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1852 -
# 2 Short offline Completed without error 00% 1847 -
# 3 Short offline Completed without error 00% 1847 -
# 4 Short offline Completed without error 00% 1834 -
# 5 Short offline Completed without error 00% 1820 -
# 6 Short offline Completed without error 00% 1534 -

So I guess that means no errors returned?
 
Ok, you need to run a long test then - that can take several hours unfortunately.

Code:
humax# smartctl --test=long /dev/sda

(I'm sure you could have guessed that one! : )
 
Running long test now. I'll post the results when complete. Thanks for all the help so far? At this stage do you think I should just return it under warranty or is it fixable? I'm trying to avoid a trip to John Lewis as its a bit out of the way for me to get to one.
 
Once these five sectors are fixed or reallocated then it depends on whether the issue comes back. It's perfectly normal and expected for disks to suffer sector failures and perform reallocations from time to time, the problem seems to be with the way these Seagates react to that event!
 
Hmm, I'm not sure if the test is still running or completed. It says completed, but then it also says 90% remaining?


=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 1853 14435421
# 2 Short offline Completed without error 00% 1852 -
# 3 Short offline Completed without error 00% 1847 -
# 4 Short offline Completed without error 00% 1847 -
# 5 Short offline Completed without error 00% 1834 -
# 6 Short offline Completed without error 00% 1820 -
# 7 Short offline Completed without error 00% 1534 -
 
That's good - it stopped when it found the first error (10% of the way through the disk) which is at logical block address (LBA) 14435421.
Next we need to find out which file that is - can you post the output of your disk partition table please?

Code:
humax# fdisk -lu /dev/sda

and the block size of the filesystem?

Code:
humax# /mod/sbin/tune2fs -l /dev/sda2
 
Back
Top