1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Hard drive failure?

Discussion in 'HD/HDR-FOX T2 Customised Firmware' started by rob4x4, Oct 24, 2012.

  1. rob4x4

    rob4x4 Member

    Hello all.
    I've been getting glitches in my recordings and I suspect the hard drive might be faulty.
    The Humax HDD test shows no errors. I tried running the fix-disk command via telnet but I get the message:
    humax# fix-disk
    Checking disk sda

    Unmounted /dev/sda1
    umount: can't umount /mnt/hd2: Device or resource busy

    Any suggestions?
    I'm running Custom firmware version: 2.12 Humax Version: 1.02.29.

    Is there another way to check the hard drive without removing it from the case?
    Thanks in advance.
     
  2. rob4x4

    rob4x4 Member

    No worries. I tried fix-disk again about an hour later and it started running (and is still running). Maybe I was too eager to run is after the reboot into maintenance mode. I await the results!
     
  3. af123

    af123 Administrator Staff Member

    Sounds reasonable - you need to wait until it's finished mounting the disks etc. before it's really in maintenance mode. The next custom firmware should improve things in this area.
     
  4. rob4x4

    rob4x4 Member

    Hmm fix-disk threw up a load of stuff but nothing that said corrupted. Lots of invalid things.
    I tried the humax HDD Test again on the system menu but now it just freezes halfway through.
    I thought I would try diskattr diagnostic but when I try to run it I get...
    >>> Beginning diagnostic diskattr
    Running: diskattr
    No internal disk found.
    >>> Ending diagnostic diskattr

    How do I get diskattr to run?
     
  5. Black Hole

    Black Hole Felinos Guru

    What sort of glitches in what sort of recordings? HiDef in particular or StDef as well? How long have you had the Humax and how full is the disk?
     
  6. af123

    af123 Administrator Staff Member

    Can you try running it again and post the output? I've modified it to print out some information when it can't find an internal disk.
     
  7. rob4x4

    rob4x4 Member

    Its mainly on HD recordings (I don't really record much SD). The program stutters and skips at times and I get a corrupt bar across the screen. It looks different from the usual bad signal corruption that I sometimes get.
    500GB Humax is 21 months old (got 24 month warranty from John Lewis). Disk was 3/4 full so I deleted a load of stuff then copied off the remainder. Formatted the hard drive then copied the stuff back on. Now the disk is about 1/4 full.

    I ran diskattr agan and only got this...
    >>> Beginning diagnostic diskattr
    Running: diskattr

    >>> Ending diagnostic diskattr

    I also ran Fix-disk again and got this result...

    humax# fix-disk
    Checking disk sda

    Unmounted /dev/sda1
    Unmounted /dev/sda2
    Unmounted /dev/sda3

    Checking partition /dev/sda1...
    e2fsck 1.41.14 (22-Dec-2010)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    hmx_int_stor: 17/65808 files (5.9% non-contiguous), 14956/263062 blocks

    Checking partition /dev/sda3...
    e2fsck 1.41.14 (22-Dec-2010)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    hmx_int_stor: 11/655776 files (0.0% non-contiguous), 79731/2622611 blocks

    Creating swap file...
    Setting up swapspace version 1, size = 1073737728 bytes

    Checking partition /dev/sda2...
    e2fsck 1.41.14 (22-Dec-2010)
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    hmx_int_stor: 3176/29860704 files (3.1% non-contiguous), 32112680/119209984 blocks
    Are you having problems with a delete loop [Y/N]? n

    Finished - type 'reboot' to return to normal operation
    Nothing invalid this time. I haven't had a chance to test any new recordings yet. Does that all look ok?
     
  8. af123

    af123 Administrator Staff Member

    The diskattr diagnostic requires the smartmontools package to be loaded (although it doesn't say that). Do you have that installed?

    I need to complete the work on adding disk inspection to the web interface!
     
  9. rob4x4

    rob4x4 Member

    Ah I didn't know that. I will load the package and try again, although it will be tomorrow now.
    Thanks!
     
  10. Black Hole

    Black Hole Felinos Guru

    Sounds like you have made a good effort at defragging.
     
  11. xyz321

    xyz321 Well-Known Member

    It is interesting that the diskattr diagnostic will not run when fix-disk runs fine. Diskattr uses a simplified version of the same test used by fix-disk so should not fail. The difference is that fix-disk runs entirely from flash. I think your version of busybox may be corrupt, try:
    Code:
    sha1sum /mod/bin/busybox/busybox
    
    or
    Code:
    md5sum /mod/bin/busybox/busybox
    
    They should return '96d504ca2b522d192b6caad1cc1b1f327f777a5b' and 'b588c1d0957f629a91f0867c4f0a95d2' respectively. If busybox is corrupt then these commands will either return incorrect hashes or not run at all.
     
  12. rob4x4

    rob4x4 Member

    It's not getting any better. Grand Designs recorded on 4HD last night was very glitchey, skipping and jumping about.

    @xyz321
    I ran the checksums and they both came back with the numbers you quoted so I guess busybox is not corrupt.

    @af123
    I installed smartmontools and then ran diskattr and got this... I dont really understand what its telling me. Is it good or bad? I've uploaded the diskattr output as an attachment too as it kept the columns in the right place so its easier to read.

    >>> Beginning diagnostic diskattr
    Running: diskattr
    smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

    === START OF INFORMATION SECTION ===
    Model Family: Seagate Pipeline HD 5900.2
    Device Model: ST3500312CS
    Serial Number: 6VV3BJZS
    LU WWN Device Id: 5 000c50 027e2fffe
    Firmware Version: SC13
    User Capacity: 500,107,862,016 bytes [500 GB]
    Sector Size: 512 bytes logical/physical
    Device is: In smartctl database [for details use: -P show]
    ATA Version is: 8
    ATA Standard is: ATA-8-ACS revision 4
    Local Time is: Thu Oct 25 09:15:18 2012 BST
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled

    === START OF READ SMART DATA SECTION ===
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
    1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 64376395
    3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
    4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4348
    5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 1
    7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 51020073
    9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1851
    10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
    12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2174
    184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
    187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1
    188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
    189 High_Fly_Writes 0x003a 080 080 000 Old_age Always - 20
    190 Airflow_Temperature_Cel 0x0022 076 045 045 Old_age Always In_the_past 24 (Min/Max 24/24)
    194 Temperature_Celsius 0x0022 024 055 000 Old_age Always - 24 (0 10 0 0)
    195 Hardware_ECC_Recovered 0x001a 040 031 000 Old_age Always - 64376395
    197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 5
    198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 5
    199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0


    >>> Ending diagnostic diskattr
     

    Attached Files:

  13. af123

    af123 Administrator Staff Member

    Yes, that isn't good. In particular the fact that you have current pending sectors and offline uncorrectable ones explains the glitches you are seeing. These are parts of the disk (sectors) that the disk firmware isn't sure about. Next time an attempt is made to write to one of those sectors that will be resolved - either by the disk firmware deciding it's really fine or by it reallocating the sector to one of the spare ones it keeps in reserve for this. This has already happened once before as you can see by the Reallocated_Sector_Ct field.

    The first thing to do is to look at the disk self test log and kick off a short selftest manually if one hasn't been run recently:

    to view the log:

    Code:
    humax# smartctl -l selftest /dev/sda
    
    To run a short selftest (which is what the Humax on-screen disk test does):

    Code:
    humax# smartctl --test=short /dev/sda
    
    (assuming your hard disk is currently /dev/sda which it will be unless you have any USB drives connected)

    Then view the log again.
     
  14. af123

    af123 Administrator Staff Member

    In my experience, and I can't entirely explain it, if these Seagate disks have any 'pending sectors' then they will occasionally return bad data for any file read from it. It may be something related to the different error detection/correction algorithm used on the AV drives.
     
  15. rob4x4

    rob4x4 Member

    I ran the disk test. First log said this....
    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Short offline Completed without error 00% 1847 -
    # 2 Short offline Completed without error 00% 1847 -
    # 3 Short offline Completed without error 00% 1834 -
    # 4 Short offline Completed without error 00% 1820 -
    # 5 Short offline Completed without error 00% 1534 -

    Then after I ran the test the second log said this....

    humax# smartctl -l selftest /dev/sda
    smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Short offline Completed without error 00% 1852 -
    # 2 Short offline Completed without error 00% 1847 -
    # 3 Short offline Completed without error 00% 1847 -
    # 4 Short offline Completed without error 00% 1834 -
    # 5 Short offline Completed without error 00% 1820 -
    # 6 Short offline Completed without error 00% 1534 -

    So I guess that means no errors returned?
     
  16. af123

    af123 Administrator Staff Member

    Ok, you need to run a long test then - that can take several hours unfortunately.

    Code:
    humax# smartctl --test=long /dev/sda
    
    (I'm sure you could have guessed that one! : )
     
  17. rob4x4

    rob4x4 Member

    Running long test now. I'll post the results when complete. Thanks for all the help so far? At this stage do you think I should just return it under warranty or is it fixable? I'm trying to avoid a trip to John Lewis as its a bit out of the way for me to get to one.
     
  18. af123

    af123 Administrator Staff Member

    Once these five sectors are fixed or reallocated then it depends on whether the issue comes back. It's perfectly normal and expected for disks to suffer sector failures and perform reallocations from time to time, the problem seems to be with the way these Seagates react to that event!
     
  19. rob4x4

    rob4x4 Member

    Hmm, I'm not sure if the test is still running or completed. It says completed, but then it also says 90% remaining?


    === START OF READ SMART DATA SECTION ===
    SMART Self-test log structure revision number 1
    Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
    # 1 Extended offline Completed: read failure 90% 1853 14435421
    # 2 Short offline Completed without error 00% 1852 -
    # 3 Short offline Completed without error 00% 1847 -
    # 4 Short offline Completed without error 00% 1847 -
    # 5 Short offline Completed without error 00% 1834 -
    # 6 Short offline Completed without error 00% 1820 -
    # 7 Short offline Completed without error 00% 1534 -
     
  20. af123

    af123 Administrator Staff Member

    That's good - it stopped when it found the first error (10% of the way through the disk) which is at logical block address (LBA) 14435421.
    Next we need to find out which file that is - can you post the output of your disk partition table please?

    Code:
    humax# fdisk -lu /dev/sda
    
    and the block size of the filesystem?

    Code:
    humax# /mod/sbin/tune2fs -l /dev/sda2