Current_Pending_Sector/Offline_Uncorrectable errors - will fix-disk help?

Ian Manning · Oct 13, 2020

I'll check the signal strength and quality the next time it happens.
In the meantime I kicked off another fix-disk about 40 minutes ago and it hasn't moved on from "Running short disk self test - Waiting... 0" (see below). Does that sound normal? (it's a 2TB drive)
The front display says "Fixdisk Init"

MartinLiddle · Oct 13, 2020

Ian Manning said:
In the meantime I kicked off another fix-disk about 40 minutes ago and it hasn't moved on from "Running short disk self test - Waiting... 0" (see below). Does that sound normal? (it's a 2TB drive)

No that isn't normal; wait for one of the fix-disk gurus to tell you what to do next.

MymsMan · Oct 13, 2020

Not a guru ...
Try connecting a new telnet session and try connecting to abduco session
but if front panel also frozen it looks to have crashed very early so probably just restart fixdisk and hope it gets further

Ian Manning · Oct 13, 2020

I get "connection refused" if I try to connect in from another telnet session. Presumably I should just power off the Humax and start again?

Ian Manning · Oct 14, 2020

I've just retried the fix-disk and it's doing exactly the same thing as before (stuck at "Running short disk self test - Waiting... 0"). I'm guessing that this is not good news?

/df · Oct 14, 2020

Try the -l option to fix-disk. It forces a different and possibly more correct test procedure.

Don't expect to use the disk for a few hours while it runs.

Ian Manning · Oct 15, 2020

I ran fix-disk overnight with the -l parameter. Approx. 10 hours later the screen was full of these messages:

ncheck: EXT2 directory corrupted while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate
ncheck: EXT2 directory corrupted while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate
ncheck: EXT2 directory corrupted while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate
ncheck: Invalid argument while calling ext2_dir_iterate

...and the job did not appear to complete - i.e. there was no message indicating that it had completed. I've now rebooted it.

Any ideas??

Ian Manning · Oct 15, 2020

Here are the disk diags following the latest fix-disk:

MartinLiddle · Oct 15, 2020

Ian Manning said:
Here are the disk diags following the latest fix-disk:

The hard drive SMART data looks a lot better with the Current_pending_sector count down to zero. I would try running fix-disk again to allow it to sort out any errors in the file system.

/df · Oct 15, 2020

If it was like this, the ncheck messages arise when fix-disk is trying to navigate the filesystem to identify which file or directory is affected by a bad sector. This can work well if bad sectors haven't affected directories, but not otherwise.

"I agree with him". You just won't have a reliable list of affected files. Use -P to avoid an extra SMART test.

Ian Manning · Oct 15, 2020

Thanks for the feedback. So the conclusion is: no need to replace the HDD just yet, but run another fix-disk with -l, -P and -y?

/df · Oct 15, 2020

Just -P -y. That'll run the just the filesystem check/fix, which should result in a valid filesystem (actually, 3, 1 per partition).

A corrupt filesystem has appeared to be the root cause of various system crashes, hangs and glitches, so possibly also your signal issue.

The OEM firmware doesn't let you correct such errors except by backing up and re-formatting.

MartinLiddle · Oct 15, 2020

Ian Manning said:
So the conclusion is: no need to replace the HDD just yet.

My advice would be to keep an eye on the reallocated sector count (say once a week) and if it is not increasing significantly then keep using the hard drive.

MymsMan · Oct 15, 2020

Ian Manning said:
Thanks for the feedback. So the conclusion is: no need to replace the HDD just yet, but run another fix-disk with -l, -P and -y?

It is a pity the ncheck messages give no indication as to which directories they referred so that you could have some idea which files you may have lost.

The file system can't be all bad otherwise you wouldn't be able to use the webif or play recordings but until it has been repaired I wouldn't risk new recordings and avoid anything that writes too much (decryption, shrink, cropping etc) to avoid exacerbating the problems.
You might also want to watch unwatached recordings or copy to another device any recordings you want to save in case re-formatting the disk becomes necessary.

Ian Manning · Oct 15, 2020

OK I'll do another fix-disk with -P -y. In the meantime I just logged into the web IF and got this:

Reading between the lines, would it be sensible just to replace the drive, rather than waiting for it to fail (and lose my recordings)?

Ian Manning · Oct 15, 2020

The fix-disk finally finished this time. Alot of these messages:

...and finally this:

MymsMan · Oct 15, 2020

Ian Manning said:
OK I'll do another fix-disk with -P -y. In the meantime I just logged into the web IF and got this:
View attachment 4974
Reading between the lines, would it be sensible just to replace the drive, rather than waiting for it to fail (and lose my recordings)?

This is not a new error, it is just telling you what you already knew - that fix disk has successfully reallocated the sectors that were in pending state

Once you have sorted out the file system issues caused by those bad sectors you should be OK provided the numbers don't start climbing steeply.

/df · Oct 15, 2020

As the filesystem check ran out of memory, it probably needs to be run again. With any luck the fixes so far will have left it (I assume the Video partition) in a good enough state to run the check to completion now. If it completed the check on the 1st and 3rd partitions last time, use -2 -P -y to check just the Video partition.

Ian Manning · Oct 15, 2020

This one finished more quickly (-2 -P -y), but still had the memory allocation error, and still spat out shedloads of "illegal indirect block" errors:

/df · Oct 15, 2020

Ian Manning said:
This one finished more quickly (-2 -P -y), but still had the memory allocation error, and still spat out shedloads of "illegal indirect block" errors:

It's moved on. The problem is described here.

You could just continue until it works. Or, if you don't mind using the maintenance mode shell, this might sort it in one go (based on the link above, untested).

At the maintenance mode command line (cli), run mount[Enter] to check whether partition 3 (probably /dev/sda3) is mounted. If not, mount /dev/sda3 /mnt/hd3[Enter]. Create a cache directory on that partition for e2fsck mkdir -p /mnt/hd3/e2fscache[Enter].

Create a configuration file for e2fsck:

Code:

cat <<EOM >/var/lib/humaxtv_backup/mod/e2fsck.conf 
[scratch_files]
directory = /mnt/hd3/e2fscache
EOM

Check that the new file in fact contains lines 2 and 3 above as it should cat /var/lib/humaxtv_backup/mod/e2fsck.conf[Enter].

Now, go to an environment where this configuration should be used E2FSCK_CONFIG=/var/lib/humaxtv_backup/mod/e2fsck.conf tmenu[Enter] (not E2FSCK_OPTS!). This should bring up the familiar maintenance mode menu (after PIN entry). When you run fix-disk from this menu, e2fsck should find out about the cache directory, fill it with data needed to fix a giant inode, and not run out of memory.

If this works it would be a candidate for updating fix-disk in the next CF. fix-disk uses a swap file to gain memory headroom for scanning the large partition, but it's obviously not enough. The [scratch_files] option might not have been available when fix-disk was created, but we have /sbin/e2fsck which is linked to v1.42.13 in /usr/lib/ext2 in the CF now (as well as two statically linked versions in the repository, which wouldn't be accessible to fix-disk as their installations would be on the filesystem being fixed).

Current_Pending_Sector/Offline_Uncorrectable errors - will fix-disk help?

Member

Super Moderator

Ad detector

Member

Member

Well-Known Member

Member

Member

Super Moderator

Well-Known Member

Member

Well-Known Member

Super Moderator

Ad detector

Member

Member

Ad detector

Well-Known Member

Member

Well-Known Member