Disk problem

aekostas

Member
Hello,

My HDR-FOX T2 developed HDD problems. Over the weekend I ran fixdisk for two sessions, each lasting hours (6? more?) until I stopped it and things have improved a bit; from memory, I went down from three rows of problems to one, as follows:

Code:
SMART Status    PASSED    
Device Model     WDC WD10EZRX-00L4HB0
Serial Number     WD-WCC4J0771360
LU WWN Device Id     5 0014ee 2b3de474f
Firmware Version     01.01A01
User Capacity     1,000,204,886,016 bytes [1.00 TB]
Sector Sizes     512 bytes logical, 4096 bytes physical
ATA Version is     8
ATA Standard is     Exact ATA specification draft version not indicated
Local Time is     Mon Dec 10 07:23:45 2018 GMT
SMART support is     Available - device has SMART capability.
SMART support is     Enabled
Attributes
ID     Name     Flags     Raw Value     Value     Worst     Thresh     Type     Updated     When Failed
1     Raw_Read_Error_Rate     POSR-K     32     200     200     051     Pre-fail     Always     -
3     Spin_Up_Time     POS--K     4133     137     130     021     Pre-fail     Always     -
4     Start_Stop_Count     -O--CK     3861     097     097     000     Old_age     Always     -
5     Reallocated_Sector_Ct     PO--CK     0     200     200     140     Pre-fail     Always     -
7     Seek_Error_Rate     -OSR-K     0     200     200     000     Old_age     Always     -
9     Power_On_Hours     -O--CK     3508     096     096     000     Old_age     Always     -
10     Spin_Retry_Count     -O--CK     0     100     100     000     Old_age     Always     -
11     Calibration_Retry_Count     -O--CK     0     100     100     000     Old_age     Always     -
12     Power_Cycle_Count     -O--CK     3858     097     097     000     Old_age     Always     -
192     Power-Off_Retract_Count     -O--CK     3795     195     195     000     Old_age     Always     -
193     Load_Cycle_Count     -O--CK     21559     193     193     000     Old_age     Always     -
194     Temperature_Celsius     -O---K     27     116     082     000     Old_age     Always     -
196     Reallocated_Event_Count     -O--CK     0     200     200     000     Old_age     Always     -
197     Current_Pending_Sector     -O--CK     0     200     200     000     Old_age     Always     -
198     Offline_Uncorrectable     ----CK     3     200     200     000     Old_age     Offline     -
199     UDMA_CRC_Error_Count     -O--CK     759     200     192     000     Old_age     Always     -
200     Multi_Zone_Error_Rate     ---R--     2     200     200     000     Old_age     Offline     -

(I think I used to have 196 and 197 highlighted; now only 198.)

Fixdisk now gives up quickly thus:

Code:
Please select option: fixdisk
Any additional options (-h for list or press return for none):
Are you sure you wish to run the hard disk checker? [Y/N] y
Running /bin/fix-disk

Checking disk sda (4096 byte sectors)

Partition /dev/sda1 is already unmounted
Partition /dev/sda2 is already unmounted
Partition /dev/sda3 is already unmounted

Running short disk self test

Pending sector error(s) found

LBA has not yet been found
A long test is required - this could take 2 hour(s) 25 minutes
Do you wish to continue? [Y/N]: y
Running long disk self test

Error - pending sectors but LBA not found
fix-disk: session terminated with exit status 1

Press return to continue:

This exited in seconds.

Any thoughts will be most appreciated.
 
Last edited:
Hello again,

Is this the wrong forum for this question? Or is my problem not disconcerting?

Thanks for any answers.
 
Is this the wrong forum for this question?
No.
Or is my problem not disconcerting?
It is in need of fixing. I'm not sure what to suggest. Maybe fix-disk can't deal with what that WD disk is saying (it was probably written/tested against Seagate drives).
What does the raw output of "smartctl -a /dev/sda" look like? And post it using [code] tags.
 
Thanks for the suggestion. Here goes the command outcome:

Code:
# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD10EZRX-00L4HB0
Serial Number:    WD-WCC4J0771360
LU WWN Device Id: 5 0014ee 2b3de474f
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Dec 12 07:46:08 2018 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (12660) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 145) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       32
  3 Spin_Up_Time            0x0027   137   130   021    Pre-fail  Always       -       4141
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3866
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3513
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3863
192 Power-Off_Retract_Count 0x0032   195   195   000    Old_age   Always       -       3800
193 Load_Cycle_Count        0x0032   193   193   000    Old_age   Always       -       21574
194 Temperature_Celsius     0x0022   119   082   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    0x0032   200   192   000    Old_age   Always       -       759
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3508         -
# 2  Short offline       Completed without error       00%      3496         -
# 3  Short offline       Completed: read failure       90%      3496         1142627824
# 4  Short offline       Completed: read failure       90%      3496         1142627824
# 5  Selective offline   Completed: read failure       90%      3495         963535768
# 6  Extended offline    Completed: read failure       90%      3495         963535760
# 7  Short offline       Completed: read failure       90%      3495         963535760
# 8  Conveyance offline  Completed without error       00%        58         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

humax#
 
When fix-disk reports
Code:
Pending sector error(s) found
it means that one of Current_Pending_Sector or Offline_Uncorrectable is greater than 0.

You have no Pending Sectors and 3 Offline Uncorrectable: previous failed sectors 963535760, 963535768 and 1142627824 have been corrected without remapping.

With the Seagate Pipeline drives, these parameters are normally seen to be the same value; correcting a pending sector also decrements Offline_Uncorrectable. But the definition of these parameters depends on the drive firmware, and in particular will vary by vendor (or vendor's subcontractor).

Perhaps WD's Green firmware treats Offline Uncorrectable as a historical running total, although there's no sign of this when comparing https://www.smartmontools.org/wiki/AttributesWestern-Digital with https://www.smartmontools.org/wiki/AttributesSeagate.

If so, your disk may now be fine. Two suggestions:
  • install the smartmontools package which will give you a newer version of smartctl than that bundled with the CF;
  • run another extended selftest
    Code:
    # smartctl -t long /dev/sda
    and see what smartctl -a says then.
 
Thanks. I upgraded smartmontools (it appears it was already installed). How do I invoke it? I also have a shedload more upgrades to be doing, apparently, but I will wait for smartctl to finish.

In the meanwhile I kicked off the long smartctl test; it will finish just after 21:00 today, it thought.
 
How does this look? (And does it look like it has finished?)

Code:
humax# smartctl -a /dev/sda
smartctl 6.4 2015-06-04 r4109 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Green
Device Model:     WDC WD10EZRX-00L4HB0
Serial Number:    WD-WCC4J0771360
LU WWN Device Id: 5 0014ee 2b3de474f
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Dec 12 21:10:45 2018 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 241) Self-test routine in progress...
                                        10% of test remaining.
Total time to complete Offline
data collection:                (12660) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 145) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       32
  3 Spin_Up_Time            0x0027   137   130   021    Pre-fail  Always       -       4141
  4 Start_Stop_Count        0x0032   097   097   000    Old_age   Always       -       3867
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3516
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       3864
192 Power-Off_Retract_Count 0x0032   195   195   000    Old_age   Always       -       3801
193 Load_Cycle_Count        0x0032   193   193   000    Old_age   Always       -       21574
194 Temperature_Celsius     0x0022   089   082   000    Old_age   Always       -       54
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       3
199 UDMA_CRC_Error_Count    0x0032   200   192   000    Old_age   Always       -       759
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3508         -
# 2  Short offline       Completed without error       00%      3496         -
# 3  Short offline       Completed: read failure       90%      3496         1142627824
# 4  Short offline       Completed: read failure       90%      3496         1142627824
# 5  Selective offline   Completed: read failure       90%      3495         963535768
# 6  Extended offline    Completed: read failure       90%      3495         963535760
# 7  Short offline       Completed: read failure       90%      3495         963535760
# 8  Conveyance offline  Completed without error       00%        58         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
See package auto-update.

I just did that, thanks. But first I clicked "Update all packages", which broke the web interface. I saw the fixweb option on telnet and I am back online, despite some errors.

What CF version are you running? Something ancient by the look of smartctl 5.41.
Install 3.13 and re-run the fix-disk procedure.

I am and have been on 3.13 for a while, I think.
 
How does this look? (And does it look like it has finished?)

Code:
humax# smartctl -a /dev/sda
...
SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3508         -
tl;dr: it doesn't even look like it has started.

The posted output shows no sign of the extended (long) selftest on this drive; there should be a new #1 entry different from the one in your output posted previously (with "Extended offline" in the 2nd column and a higher LifeTime) .

When you run smartctl -t long /dev/sda, after a few minutes run smartctl -l selftest /dev/sda. The "-l" (ell) option shows a subset of SMART data covering logs, as excerpted above; it should show a new "Extended offline" entry.

After the predicted finish of the test, or perhaps about 2/3 of the way through, you can run the log display again to see whether the test has actually finished.

The SMART selftests are carried out by the drive firmware, interleaved with low priority with normal disk activity. Apart from possibly making the test run longer there should be no interaction with activity like recording and package updates.
 
You think? Can't you check and positively confirm?

Ok, ambiguous statement, sorry: I am running 3.13:

Code:
Custom firmware version: 3.13 (build 4028)
Humax Version: 1.03.12 (kernel HDR_CFW_3.13)

Given that I did not go near the device with a USB stick, I assume I have been running it "for a while". But I saw no obvious way to find out when I updated it to that.

I just started smartctl -t long /dev/sda. I will try smartctl -l selftest /dev/sda in 30 minutes or so and see what happens. I will then have to leave the box until late in the day; no recordings planned.

And thank you all for the interest, it is very much appreciated.
 
And here we go:

Code:
humax# smartctl -l selftest /dev/sda
smartctl 6.4 2015-06-04 r4109 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%      3518         -
# 2  Extended offline    Completed without error       00%      3516         -
# 3  Short offline       Completed without error       00%      3508         -
# 4  Short offline       Completed without error       00%      3496         -
# 5  Short offline       Completed: read failure       90%      3496         1142627824
# 6  Short offline       Completed: read failure       90%      3496         1142627824
# 7  Selective offline   Completed: read failure       90%      3495         963535768
# 8  Extended offline    Completed: read failure       90%      3495         963535760
# 9  Short offline       Completed: read failure       90%      3495         963535760
#10  Conveyance offline  Completed without error       00%        58         -
5 of 5 failed self-tests are outdated by newer successful extended offline self-test # 2

Given above message about outdated tests, I went on the web interface, clicked "Check for updates" which concluded successfully, then clicked on "Upgrade all packages" and got "No packages are available for upgrade; try updating the package list from the Internet using the button above."
 
Did the "Check for updates" screen produce a list of updates available? If not, you seem to be fully updated as indicated by the message "No packages are available for upgrade; try updating the package list from the Internet using the button above.".

As to the 'print out' above, Pass.
 
Did the "Check for updates" screen produce a list of updates available? If not, you seem to be fully updated as indicated by the message "No packages are available for upgrade; try updating the package list from the Internet using the button above.".

To clarify: last night "Check for updates" gave a shedload of updates, which I did, also last night.

This morning I ran smartctl -l selftest /dev/sda and given message about outdated tests I checked for updates again; none.
 
And here we go:

Code:
humax# smartctl -l selftest /dev/sda
smartctl 6.4 2015-06-04 r4109 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%      3518         -
# 2  Extended offline    Completed without error       00%      3516         -
# 3  Short offline       Completed without error       00%      3508         -
# 4  Short offline       Completed without error       00%      3496         -
# 5  Short offline       Completed: read failure       90%      3496         1142627824
# 6  Short offline       Completed: read failure       90%      3496         1142627824
# 7  Selective offline   Completed: read failure       90%      3495         963535768
# 8  Extended offline    Completed: read failure       90%      3495         963535760
# 9  Short offline       Completed: read failure       90%      3495         963535760
#10  Conveyance offline  Completed without error       00%        58         -
5 of 5 failed self-tests are outdated by newer successful extended offline self-test # 2

Given above message about outdated tests, I went on the web interface, clicked "Check for updates" which concluded successfully, then clicked on "Upgrade all packages" and got "No packages are available for upgrade; try updating the package list from the Internet using the button above."

In my Firefox setup, quotes containing BBcode code tags as above are being truncated. Is this a general problem with the new forum software?

The line after the list of test results is telling you that the 5 test failures (numbered 5-9) have been superseded (which would have been a better word than "outdated") by a newer successful test, # 2. It's nothing to do with CF updates

Result #1 is aborted, presumably because the box was powered off, but that doesn't matter because test #2, started 2 hours earlier, was successful.

So for now everything is fine, except that your smartctl -A output indicates a fix-disk issue per my post #7.

Fix-disk is a @xyz321 work but doesn't have its own thread. The problem is that it checks SMART parameters 197 and 198 to find pending sectors, but apparently the WD Green drive firmware defines parameter 198 as Offline_Uncorrectable and, unlike the Seagate Pipeline firmware, doesn't decrement its value when a pending sector is rehabilitated (this case doesn't provide evidence of what happens if a bad sector is remapped).
 
Back
Top