Fixdisk help

MikeSh · Mar 25, 2020

One of the boxes has 197/198 errors again, so today I ran fixdisk. I have done it before, but about 2 years or more back, so basically starting from scratch.

I hooked the box up on an Ethernet cable, to avoid Wi-Fi dongle problems, and went in via webif. I followed the Maintenance Mode Disk Check page linked from there and ran fixdisk. Took a couple of hours and ... didn't fix it :confused:

(It also said LBA not found, or similar ... Is that normal?)

So I've gone into the wiki directly and found a different page - Maintenance Mode - which has a bit more info including:
Fix-Disk Notes :-
...
3. If the web interface hard disk diagnostic highlights attributes 197 or 198 in red, then the additional fix-disk test/repair should run

So does that mean it should have run automatically, or is there a 'be' missing and I need to invoke it? If the latter, then when and how exactly?

/df · Mar 26, 2020

If you didn't specify the long SMART test through an option or by agreeing to it, try doing so: tell fixdisk to use the "additional option" -l.

Otherwise, "LBA not found" seems to happen because of differences in the SMART test and logging implementations between disk firmware versions/vendors. (@af123, @xyz321: something for the next CF?).

Ezra Pound · Mar 26, 2020

MikeSh said:
So does that mean it should have run automatically, or is there a 'be' missing and I need to invoke it? If the latter, then when and how exactly?

For what it is worth I think that there should have been a 'be' in that line, however the Wiki page needs updating to reflect how fixdisk works now, the current options are :-

Code:

fix-disk options:
    Partitions to check - without any of these options all three partitions
    will be checked:
        -1          Check partition 1
        -2          Check partition 2
        -3          Check partition 3
    Other options:
        -d          Additional diagnostic output
        -B          Skip block search
        -F          Skip file system checks
        -P          Skip pending sector error (SMART) tests
        -l          Perform a long SMART disk test
        -n          No polling during SMART tests
        -y          Assume yes for all sector repairs
        -w          Wait for user input at end of run
        -x <opts>   Additional options for filesystem check (e2fsck)
                    (Must be last option)

MikeSh · Mar 26, 2020

/df said:
If you didn't specify the long SMART test through an option or by agreeing to it, try doing so: tell fixdisk to use the "additional option" -l.

So does the -l switch invoke the repair as well, or just a fuller test?
If it does repair should I use -y as well, per Ezra's post?

MartinLiddle · Mar 26, 2020

MikeSh said:
So does the -l switch invoke the repair as well, or just a fuller test?
If it does repair should I use -y as well, per Ezra's post?

Yes to both.

MikeSh · Mar 26, 2020

So at the prompt "fixdisk_-l_-y" (where _ is a space)?

Oh, also, does the computer need to stay awake throughout? If not how does one get back in to check progress if/when the connection drops?

MartinLiddle · Mar 26, 2020

MikeSh said:
So at the prompt "fixdisk_-l_-y" (where _ is a space)?

I think so; try it and see.

Black Hole · Mar 26, 2020

MikeSh said:
Oh, also, does the computer need to stay awake throughout?

No.

MikeSh said:
If not how does one get back in to check progress if/when the connection drops?

Try it and see.

af123 · Mar 26, 2020

If you run fixdisk from the prompt, then it doesn't wrap it inside abduco so you need to keep the connection open.

Alternatively, run it from the telnet menu and enter -l -y when it asks if you want to provide any other options; this is the way I'd recommend.

MikeSh · Mar 26, 2020

af123 said:
If you run fixdisk from the prompt, then it doesn't wrap it inside abduco so you need to keep the connection open.

I ran the whole thing from webif - invoked Maintenance Mode from the Diagnostics page and then it magically gave me a Telnet option. (I don't remember how - I was expecting to have to go find Putty and just gratefully took what it offered.)

af123 said:
Alternatively, run it from the telnet menu and enter -l -y when it asks if you want to provide any other options; this is the way I'd recommend.

OK. I think I remember it asking that, but absent any instructions or suggestions on the wiki page I just hit Enter.

Maybe I'll just do what I did before and set the PC to not sleep, and then just add those options at the prompt. Does that sound like a reasonable plan?

/df · Mar 26, 2020

As long as the PC doesn't crash ... Probably reasonable.

Black Hole · Mar 26, 2020

MikeSh said:
then it magically gave me a Telnet option. (I don't remember how - I was expecting to have to go find Putty and just gratefully took what it offered.)

There's a basic webshell terminal running in maintenance mode on the WebIF URL, so you have access to command line tools without needing a Telnet console. Running fixdisk using the menu prompt provides you with a protected session (meaning the session can close without terminating the job). I don't advise doing it any other way - the fixdisk run can be long, and you have no idea what might arise to close a session.

MikeSh · Mar 27, 2020

Um. That's a bit strange.
A few years back I got 8x 197/198 errors. Ran fixdisk and they moved up to 5 (Reallocated_Sector_Ct) and it's been like that until a few days ago.
Then I got 16x 197/198 errors. I ran fixdisk again today as af suggested above:

af123 said:
run it from the telnet menu and enter -l -y when it asks if you want to provide any other options

but afterwards those 16 errors are still in 197/198. But the 5 number is now 16.
I did an extra reboot and then a hard reboot in case it needed a bit of a nudge, but the errors persist.

I'm now even more lost. Any suggestions?

MartinLiddle · Mar 27, 2020

MikeSh said:
Any suggestions?

Try running fix-disk again.

MikeSh · Mar 27, 2020

MartinLiddle said:
Try running fix-disk again.

Yes, I sort of thought I'd need to.
I just want to be sure I'm not doing something wrong or missing a switch. At a few hours a run it's not a trivial exercise.

MikeSh · Mar 28, 2020

I've run it again today. Still not fixing.

Code:

      /---------------------------------------------\                                                                         
      |  M A I N T E N A N C E   M O D E   M E N U  |                                                                         
      \---------------------------------------------/                                                                         
                                                                                                                              
  [ Humax HDR-Fox T2 (humax) 1.03.12/3.13 ]                                                                                   
                                                                                                                              
 fixdisk - Check and repair hard disk.                                                                                        
   short - Run short hard-disk self test.                                                                                     
    long - Run long hard-disk self test.                                                                                      
   check - Check self-test progress.                                                                                          
    gptf - Re-format disk using GPT scheme.                                                                                   
     epg - Clear persistent EPG data.                                                                                         
    dlna - Reset DLNA server database.                                                                                        
       x - Leave maintenance mode (Humax will restart).                                                                       
    diag - Run a diagnostic.                                                                                                  
     cli - System command line (advanced users).                                                                              
                                                                                                                              
Please select option: fixdisk                                                                                                 
Any additional options (-h for list or press return for none): -l -y                                                          
Are you sure you wish to run the hard disk checker (-l -y)? [Y/N] y                                                           
Running /bin/fix-disk                                                                                                         
                                                                                                                              
Checking disk sda (4096 byte sectors)                                                                                         
                                                                                                                              
Unmounted /dev/sda1                                                                                                           
Unmounted /dev/sda2                                                                                                           
Unmounted /dev/sda3                                                                                                           
                                                                                                                              
                                                                                                                              
Running long disk self test                                                                                                   

**(Countdown for about 2.5 hours)**

Error - pending sectors but LBA not found                                                                                     
fix-disk: session terminated with exit status 1                                                                               
                                                                                                                              
Press return to continue:

I assume the error at the end is significant, but what does it mean?

/df · Mar 29, 2020

So apparently the "LBA not found" error is occurring because the long SMART test (#1, power-on hour 9843) is succeeding, yet there are still pending sectors.

This is a different case from the issue I mentioned in #2 above. Either the 16 pending/uncorrectable sectors have actually been relocated (as the 16 reallocated sectors) but a disk firmware bug is causing them still to be reported, or the long test is not managing either to correct or to detect the errors. If there's a bug you have to live with it, or perhaps try something like a security erase of the disk.

There are suggestions that the "offline" test may be able to fix the latter case smartctl -t offline /dev/sda.

man smartctl said:
The effects of this test are visible only in that it updates the SMART Attribute values, and if errors are found they will appear in the SMART error log, visible with the '-l error' option.

Otherwise you might try to run an exernal disk repair tool that doesn't rely on the SMART test, but that will take a lot longer than fixdisk.

If you want to keep the box in question working, you could set it up with a spare or replacement disk, and access the suspect disk using a USB SATA dock. But if you're doing that and have a Linux PC (or a spare that can be booted from a live CD), it will be more reliable and faster to use the dock with that, especially if the PC and dock support USB3 (or if it's a desktop PC with a spare SATA slot, better to use that).

A filesystem checker command like this should carry out a non-destructive read-write test of the disk allocated to the filesystem and then correct the resulting filesystem -- there are three separate filesystems on the HDR disk:

Code:

# disk to check: replace as appropriate
dev=/dev/sda
# for each of the 3 partitions: check with: -cc read-write block test; -f force check; -y agree to repairs
for nn in 1 2 3; do e2fsck -cc -f -y ${dev}${nn}; done

This is supposed to be non-destructive but the usual caveats apply if any of the disk contents are precious and not otherwise backed up.

To run on the HDR, especially when repairing the large partition, the above would need a paging, aka swap, file which it doesn't have in Maintenance Mode; fixdisk does the last partition and then uses a swap file there, and you could try to emulate that:

Code:

# disk to check: replace as appropriate
dev=/dev/sda

fix1() { # partition
    # check partition with: -cc read-write block test; -f force check; -y agree to repairs
    e2fsck -cc -f -y $1
}

# partitions to be checked shouldn't be mounted
for nn in 1 2 3; do umount -l ${dev}${nn}; done

SWAPFILE=/mnt/hd3/.swap
setup_swap() {
# make a 1GB paging file, quite slow
dd if=/dev/zero of=$SWAPFILE bs=1M count=1024 2> /dev/null &&
    mkswap $SWAPFILE
}

swap_on() {
      setup_swap &&
          swapon $SWAPFILE || return 1
      mount -o remount,size=512M /tmp
}

# Do partition 3 of the disk first, mount /mnt/hd3 if necessary, make a "big" swap file and enlarge /tmp
fix1 ${dev}3
mount | grep -q " on /mnt/hd3 " || mount ${dev}3 /mnt/hd3
swapoff $SWAPFILE
swap_on
grep -qE "^$SWAPFILE " /proc/swaps || echo "Swap file problem" 2>&1

# do each of the remaining partitions
for nn in 1 2; do fix1 ${dev}${nn} || break; done

I can't remember trying this on a running HDR but it may be asking too much for the HDR to run the PVR functions as well as the disk check, even with a much bigger swap file.

However even quite an old desktop PC should have enough RAM for a live filesystem as well as the entire check program and its spawned badblocks program and their buffers, as well as a beefier CPU.

Black Hole · Mar 29, 2020

/df said:
it may be asking too much for the HDR to run the PVR functions as well as the disk check

Wouldn't that be in maintenance mode anyway?

MikeSh · Mar 29, 2020

Thank you for your extensive reply.

/df said:
This is a different case from the issue I mentioned in #2 above. Either the 16 pending/uncorrectable sectors have actually been relocated (as the 16 reallocated sectors) but a disk firmware bug is causing them still to be reported, or the long test is not managing either to correct or to detect the

For the last 2-3 years the Reallocated count has been 8, until this new set of 16 appeared in 197/8. Those original 8 were successfully moved by fixdisk, so it seems odd that this bug has now surfaced unless it's due to a change in fixdisk.

/df said:
If you want to keep the box in question working,

I do, but TBH the rest of your post is on the edge of my understanding and attempting those sorts of procedures would be taxing, at the least.

Although this disk hasn't done a lot of miles it is the original and must be going on 10 years old, so may simply be ageing. These errors don't seem to be causing problems at present, so I think I'll start working toward a new disk instead. It's going to need it sooner or later.

Thanks again for your time and advice.

/df · Mar 30, 2020

Black Hole said:
Wouldn't that be in maintenance mode anyway?

Not necessarily if the suspect disk had been replaced and was being connected via a USB-SATA bridge.

Fixdisk help

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Super Moderator

Well-Known Member

Super Moderator

May contain traces of nut

Administrator

Well-Known Member

Well-Known Member

May contain traces of nut

Well-Known Member

Super Moderator

Well-Known Member

Well-Known Member

Well-Known Member

May contain traces of nut

Well-Known Member

Well-Known Member