Maintenance mode rewrites hundreds of bad sectors

snafu

New Member
Hiya,
(newbie first posting).
I've had a HDR-FOX T2 for several years and following disk issues ran fixdisk.

I upgraded the firmware using HDR_FOX_T2_1.03.12_mod_3.10.zip and also reinstalled the webif.
Fixdisk found multiple bad sectors but it was always able to rewrite them successfully, but pressing "y" for over an hour was a waste of time.
So ... I re-ran fixdisk with the "-y" option and the counter got down to 863 then the fun started.
Currently it has been fixing bad sectors for around 6 hours issuing telnet dialogue of the form:

Code:
Running select disk self test
Error at LBA 1827680878

/dev/sda:
re-writing sector 1827680878: succeeded
(Just to be clear, the sector count increases, i.e. it isn't repeatedly fixing the same sector!)

What I'd like to do is speed this up if possible or, alternatively, check if the thing is in a loop.

Would it be any quicker to boot off a USB and run a disk check from there?
I don't know exactly what flavour of linux would be be best suited for this.
Alternatively is there any way from the telnet command line I can check to see that the box is actually doing something (without breaking the continuing disk check?).
I appreciate that my hdd may be failing but - so far - all the errors have reportedly been fixed OK and I'm in, unfortunately, an area prone to short power outages. I'm guessing the box was actually recording when a power cut hit and that caused disk issues.

Anyway, any thoughts, comments, suggestions gratefully received!

Thanks in advance.
 
Last edited:

af123

Administrator
Staff member
What I'd like to do is speed this up if possible or, alternatively, check if the thing is in a loop.
Running with the -B option (as well as -y) will skip the block search part (where it tries to find out which file contains the problem sector). That should speed it up quite a lot.

To check if it's making progress, you can look at the pending sector count from another command line session:
Code:
humax# /bin/smartctl -A /dev/sda | grep Pending
Hopefully that will be decreasing.

I appreciate that my hdd may be failing...
I'd definitely consider it suspect with that many suspect sectors, but you may be lucky if it was just a firmware glitch!
 
OP
S

snafu

New Member
Running with the -B option (as well as -y) will skip the block search part (where it tries to find out which file contains the problem sector). That should speed it up quite a lot.

To check if it's making progress, you can look at the pending sector count from another command line session:
Code:
humax# /bin/smartctl -A /dev/sda | grep Pending
Hopefully that will be decreasing.


I'd definitely consider it suspect with that many suspect sectors, but you may be lucky if it was just a firmware glitch!

Thanks for the reply, af123, much appreciated :)
Tried that a few times after it'd done a few sector repairs and got this (apologies for the wrapping, hope it makes sense):

Code:
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       2
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       2
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       2
humax# /bin/smartctl -A /dev/sda | grep Pending
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -
       3
Hmm ... Pending isn't reducing ... any thoughts?
 
Last edited:

af123

Administrator
Staff member
Pending is actually 0, which is good (assuming that sda is your internal disk - do you have any other disks connected?)
What about the output of the self-test log:

Code:
humax# smartctl -l selftest /dev/sda
 
OP
S

snafu

New Member
Pending is actually 0, which is good (assuming that sda is your internal disk - do you have any other disks connected?)
What about the output of the self-test log:

Code:
humax# smartctl -l selftest /dev/sda
Ran it a few times and it does reflect the sector repairs on the fixdisk session.
Am I glad you can read wrapped output properly! :)
Code:
humax# /bin/smartctl -l selftest /dev/sda
smartctl 6.4 2015-06-04 r4109 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-15, 

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA
_of_first_error
# 1  Selective offline   Completed: read failure       90%     14976         182
7682433
# 2  Selective offline   Completed: read failure       90%     14976         182
7682431
# 3  Selective offline   Completed: read failure       90%     14976         182
7682429
# 4  Selective offline   Completed: read failure       90%     14976         182
7682427
# 5  Selective offline   Completed: read failure       90%     14976         182
7682426
# 6  Selective offline   Completed: read failure       90%     14976         182
7682425
# 7  Selective offline   Completed: read failure       90%     14976         182
7682424
# 8  Selective offline   Completed: read failure       90%     14976         182
7682422
# 9  Selective offline   Completed: read failure       90%     14976         182
7682420
#10  Selective offline   Completed: read failure       90%     14976         182
7682418
#11  Selective offline   Completed: read failure       90%     14976         182
7682416
#12  Selective offline   Completed: read failure       90%     14976         182
7682415
#13  Selective offline   Completed: read failure       90%     14976         182
7682413
#14  Selective offline   Completed: read failure       90%     14976         182
7682411
#15  Selective offline   Completed: read failure       90%     14976         182
7682409
#16  Selective offline   Completed: read failure       90%     14976         182
7682407
#17  Selective offline   Completed: read failure       90%     14976         182
7682405
#18  Selective offline   Completed: read failure       90%     14976         182
7682404
#19  Selective offline   Completed: read failure       90%     14976         182
7682403
#20  Selective offline   Completed: read failure       90%     14976         182
7682402
#21  Selective offline   Completed: read failure       90%     14976         182
7682400
Is there any way I can get an estimate of how long it's likely to take to complete?
Also, would rebooting and re-running with "-B -y" undo the existing disk fixes (I'd guess not but would like to check with someone more knowledgeable than myself - I tend to try it and see...)
Thanks again for your help.
 

af123

Administrator
Staff member
Is there any way I can get an estimate of how long it's likely to take to complete?
You could try looking at other attributes:

Code:
humax# smartctl -A /dev/sda
Since pending sectors is zero, try "offline uncorrectable" and see if that gives a decreasing number.

Also, would rebooting and re-running with "-B -y" undo the existing disk fixes (I'd guess not but would like to check with someone more knowledgeable than myself - I tend to try it and see...)
No, you won't lose the fixes so far. Given the expected speed increase, I'd be inclined to give it a go.
 
OP
S

snafu

New Member
snafu: Newbies' Guide to the Forum (click)

The above tells you how to go about presenting terminal dumps (amongst other things).

Update: I see you have in the latest post - but there's nowt stopping you patching up the others.
Roger wilco - apologies!
 
OP
S

snafu

New Member
Since pending sectors is zero, try "offline uncorrectable" and see if that gives a decreasing number.
Thanks for your patience ... errr ... how d I do that? --help assumes rather more knowledge about where stuff is than I possess.
Sorry to be dim!
 

af123

Administrator
Staff member
Thanks for your patience ... errr ... how d I do that? --help assumes rather more knowledge about where stuff is than I possess.
Sorry to be dim!
Just running 'humax# smartctl -A /dev/sda'
will dump out all of the disk attributes - just look for the line relating to offline uncorrectable sectors.
 
OP
S

snafu

New Member
OK, I ran it a few times with a couple of minutes between them. I also tidied up the output.
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       3
...
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       3
...
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       3
Nothing seems to be changing there. Do these values raise alarm bells with you? Or does it mean (as I'd assume) that whatever it's fixing isn't reported here?
Do you still think it's worth using "-B -y" (I assume the order doesn't matter) or is it twiddling it's thumbs?
I'm about to capture the entire output and look for any changes over time...
Again, thanks for taking the time to help.
 

af123

Administrator
Staff member
I'd kick the whole thing off again with -B -y - I can't see what it's fixing in the attribute output, which is odd in itself.
 
OP
S

snafu

New Member
I'd kick the whole thing off again with -B -y - I can't see what it's fixing in the attribute output, which is odd in itself.
Roger wilco.

BTW it took a bit of waiting but there are changes in the output:

Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       314841148
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       3
...
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       314841360
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       2
I'd guess the seek error rate isn't significant but the sector and offline numbers have gone down.
Also, the counter did start at 863, now it's 859 so something's changed.

I'll let you know how I get on but it maybe won't be for awhile.

Again, many thanks for your help, much appreciated.

Cheers! :)
 
OP
S

snafu

New Member
Update:

Unfortunately although these sector repairs (re-writes) always appear to succeed, there doesn't seem to be any way to determine how many remain.

I'm looking for a way to speed things up. Sadly, fixdisk with "-B -y" didn't make a great deal of difference.

The time maths are what worry me:

It takes about 3 seconds for the system to repair 1 sector of 512 bytes.
Thus for 1MB it would take 2000 * 3 or about 100 minutes to repair 1MB worth of errors.
1GB would require about 70 days!

This ignores the additional time taken for the system to seek up to the errors; around 2 to 2.5 hours.
(This is significant because I run the repair overnight and have to reboot every morning so scheduled recording can happen).

I have run the disk repair for a total of around 16 hours now and have no idea how many bad sectors remain.​


So I'm asking for options.

The nuclear option, I guess, is to copy off recording and reformat the hard drive.

I was wondering if there's any way of speeding up the disk checking?
One thought was to create a boot USB and run fsck from that.
However my worries/problems are:
  • I don't know if the version of linux the Humax runs is customised in some way that would cause vanilla linux problems. Ditto encryption. (BTW, anyone know what version of linux the Humax uses?).
  • Another thought was to boot into maintenance mode and maybe run fsck directly from the cli?
  • This approach may not, in fact, speed things up.

So if anyone has any thoughts or comments I'd be very grateful.
Until then, I'll be running fixdisk each night :)

Many thanks in advance for your time and trouble.
 

af123

Administrator
Staff member
It takes about 3 seconds for the system to repair 1 sector of 512 bytes.
There should never be more than a handful of sectors that need repairing though. These are the offline-uncorrectable sectors and from the output you posted earlier, there are only two that need fixing.
Also, the counter did start at 863, now it's 859 so something's changed.
Which counter is this? I assumed it was the number of pending sectors but that doesn't tie up with the output.
The nuclear option, I guess, is to copy off recording and reformat the hard drive.
That won't fix the underlying disk faults.
One thought was to create a boot USB and run fsck from that.
It isn't possible to boot the Humax from USB. You could pull out the disk and attach it to a PC running Linux booted from USB though.
Another thought was to boot into maintenance mode and maybe run fsck directly from the cli?
This has nothing to do with fsck - you haven't got to that stage yet. It's still trying to repair the damaged/suspect sectors on the disk. Once they're all done, it will run fsck.
 
OP
S

snafu

New Member
Which counter is this? I assumed it was the number of pending sectors but that doesn't tie up with the output.
Sorry, wasn't clear. When the long disk check starts there's a countdown that starts (for my box anyway) at over 13000.
It isn't until this gets down to 859 (was 863) that the sector repairs start appearing.


It isn't possible to boot the Humax from USB. You could pull out the disk and attach it to a PC running Linux booted from USB though.
Would the version of linux be significant? I do have an old linux box lying around. Not fired it up for ages. Might try that if all else fails...

What would you do in my situation?

As ever thanks for your help, much appreciated. Have to reboot the thing now as it'll be recording in 5 minutes...
 

af123

Administrator
Staff member
Sorry, wasn't clear. When the long disk check starts there's a countdown that starts (for my box anyway) at over 13000.
It isn't until this gets down to 859 (was 863) that the sector repairs start appearing.
Ah, ok. That's just an estimate of the time remaining for the check to complete.

What would you do in my situation?
If it's the long disk checks that are taking the time, I would get to the command line and manually run selective tests. You can choose to just scan a particular range of the disk which will speed them up. I'd then repair any defective sectors by hand.

Interestingly, your test log shows that selective tests are being used but obviously not quickly enough and I haven't read through that part of fix-disk to see how it uses selective tests (xyz321 can you provide any insight?)

Find the last block that had an error from the selftest log (smartctl -l selftest /dev/sda)
for example, 1827682433

Then you can start a selective test like this:

Code:
humax# smartctl -t select,1827682433-max /dev/sda
smartctl 6.4 2015-06-04 r4109 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".
SPAN         STARTING_LBA           ENDING_LBA
   0           1827682433           3907029167
Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.
Testing has begun.
Monitor the progress with 'smartctl -l selftest /dev/sda' and when it finds a bad sector, then repair it with:

Code:
hdparm --repair-sector 123456789 --yes-i-know-what-i-am-doing /dev/sda
and start another selective test from there on.

If this is an advanced-format disk (one with 4K physical sectors) then you should repair all of the logical sectors that make up the physical one. One way to check if it's an AF disk is 'diag 4kalign'

Another option is to scan the entire disk for bad blocks using the 'badblocks' utility, just:

Code:
humax# badblocks -b 512 -s /dev/sda
Checking for bad blocks (read-only test):   0.01% done, 0:03 elapsed. (0/0/0 errors)
which should produce a definitive list of bad blocks that you can then repair.
 
OP
S

snafu

New Member
One way to check if it's an AF disk is 'diag 4kalign'
Sadly not, I fear:

Code:
humax# diag 4kalign
Running: 4kalign

--> This is a Standard Format drive.

        Model Number:       ST31000424CS
        Logical/Physical Sector size:           512 bytes
        Nominal Media Rotation Rate: 5900

Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot      Start        End    Sectors   Size Id Type
/dev/sdb1                2    2104514    2104513     1G 83 Linux
/dev/sdb2          2104515 1932539174 1930434660 920.5G 83 Linux
/dev/sdb3       1932539175 1953520064   20980890    10G 83 Linux


Standard format drive - partitions are always aligned.
Running badblocks, but it's slow (9% and creeping up). Also, I should've more'd it into a text file - if it's a huge list then it'll overflow the screen buffer...


I doubt I could manually repair sectors as quickly as fix-disk, but I might be able to hack a quick and dirty script that does the job.
As ever, thanks!
 
OP
S

snafu

New Member
After over an hour badblocks completed:
Code:
humax# badblocks -b 512 -s /dev/sda
Checking for bad blocks (read-only test):   0.00% done, 0:00 elapsed. (0/0/0 err
done
Also, tried "smartctl -l selftest /dev/sda" but got:
Code:
humax# smartctl -l selftest /dev/sda
smartctl 6.4 2015-06-04 r4109 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sda failed: No such device or address
fstab:
Code:
# /etc/fstab: static file system information.
#
# <file system> <mount pt>     <type>   <options>         <dump> <pass>
/dev/root       /               jffs2   rw,noauto       0       1
proc            /proc           proc    defaults        0       0
devpts          /dev/pts        devpts  defaults,gid=5,mode=620 0       0
tmpfs           /tmp            tmpfs   defaults        0       0
sysfs           /sys            sysfs   defaults        0       0
#/dev/sda1      /mnt/hd1        ext3    rw,defaults,data=journal        0 0
#/dev/sda2      /mnt/hd2        ext3    rw,defaults,data=ordered        0 0
The mount command doesn't help either. As it's embedded I'm rather reluctant to fire off mount commands (at least until it stops taping, anyway!).

I notice that there's a fix-disk package listed in the webif. If I could run that in the background with the box in "normal" mode, that'd do.
I'm learning quite a bit about the innards of my pvr!

Heh, now it stopped taping it went into standby and I've lost my telnet connections.
Well, I need to reboot my PC anyway...

Thanks for your help. Not sure of my best way forward just yet but I'll reboot everything then look into webif->fix-disk.
 
Top