@Ian's post reminded me of the OP's original issue, a month ago:
have left it for a few hours and nothing. given my experience in Feb and again this weekend, looks like it will keep crashing and so won't get to the end of the fix disk. Is there any way around this crash?
...
As far as I can see, the OP's crash was that
fix-disk
ran
e2fsck
on partition 2 of the disk and this hung after running overnight.
The following procedure may help, if fix-disk didn't work. You need a week (say) when you won't be needing the HDR for watching or recording programmes, or you can extract the disk, put it into a USB SATA caddy and run a similar procedure from a Linux PC/laptop: a live CD such as
GParted Live should be fine, but the description below assumes the disk is being fixed in the HDR as
/dev/sda
.
- Get the HDR into Maintenance Mode with a telnet connection: choose the "cli" option.
- Run a SMART report on the disk
smartctl -x /dev/sda
: note in particular the physical sector size in the Sector Sizes:
line of the "INFORMATION SECTION" and the raw values of the following Vendor Specific SMART Attributes
: Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable.
- Unmount the disk's partitions:
for i in 1 2 3; do umount /mnt/hd$i; done
- Run the
badblocks
program to cause the disk firmware to fix any bad blocks. Unlike the built-in SMART test smartctl -t long /dev/sda
, this can cause the disk firmware to remap failed sectors, but as it's run across the SATA interface it's much slower. Unlike reformatting or dd if=/dev/zero of=/dev/sda bs=4M
the filesystems need not be erased.
Code:
# -b block size - use the physical sector size
# -c blocks at once - something bigger than 1000 to speed it up
# -n non-destructive - some chance of recovering the file system
# -s show progress
# -t pattern - write zeroes to disk (but non-destructively!)
# -o output bad block list (empty, we hope; would need to be munged to make filesystem block numbers)
# use abduco to keep it running as session "bb" across telnet connections
abduco -c bb badblocks -b 4096 -c 1024 -n -s -t0 -o /tmp/bb.lst /dev/sda
- Wait a whole long time, perhaps most of a week if the disk is large. Use
abduco -a bb
if necessary to reconnect to the session and check progress. If a power cut or other interruption occurs (the output file will be lost), you should probably start again, although you could append a (guessed) starting physical sector number to the original command if you believe all blocks up to that point have been tested and not found bad.
- Check the output bad block list. If all bad disk sectors were corrected by firmware (remapped to spare good sectors when written with zeroes), the output file should be empty. Run
smartctl -A /dev/sda
and compare the new raw values of Reallocated_Sector_Ct (may be higher), Current_Pending_Sector (should be 0, if the bad block list was empty), Offline_Uncorrectable (if previously >0, may have been reduced, or not, apparently depending on the vendor).
- Some sectors may have been remapped possibly resulting in corrupt files or filesystem structures, especially if the new Current_Pending_Sector count is less than the original one. If the new Current_Pending_Sector count is greater than 0, some sectors are still bad possibly resulting in corrupt files or filesystem structures; you may wish to give up on fixing the disk in this case, or you may still try to fix the disk to recover some precious recording; the (uncorrected) bad physical sector numbers should be listed in the output file.
- If no (uncorrected) bad blocks were found, run the
e2fsck
to check the partitions on the disk:
Code:
# -f force checking
# -y go ahead and fix
# -v verbosely
# -tt detailed timing statistics
# use abduco to keep it running as session "fsck" across telnet connections
# do it for the first and third partitions (small)
abduco -c fsck sh -c "for i in 1 3; do e2fsck -fyvtt /dev/sda$i; done"
- If these both worked, proceed to partition 2, for which the file system check may need more memory:
Code:
mount /dev/sda3 /mnt/hd3
dd if=/dev/zero of=/mnt/hd3/.swap0 bs=1M count=128
mkswap /mnt/hd3/.swap0 && swapon /mnt/hd3/.swap0
abduco -c fsck e2fsck -fyvtt /dev/sda2
If (uncorrected) bad blocks were found by
badblocks
and (as for OP's disk) the physical and logical sector sizes differ, it will be necessary to convert the listed physical sector numbers to logical, multiplying each by physical_sector_size/logical_sector_size (8 in this case):
Code:
cat /tmp/bb.lst | (while read num _; do echo $((8*num)); done )>/tmp/bbl.lst
Run
fdisk -l /dev/sda
to display the starting logical sector numbers of the partitions. Split the list of bad logical sector numbers into three partition bad block lists, those less than the starting sector of partition 2 in file #1, those greater than the last sector of partition 2 in file #3 and the rest in file #2. Adjust the block offsets in the non-empty resulting files, say for partition 2 starting at
p_start
:
Code:
cat /tmp/bbl2.lst | (while read num _; do echo $((num-p_start)); done )>/tmp/bbl20.lst
When running the
e2fsck
command in 8 and 9 above, if there is an adjusted partition bad block list, just add the
-L
option with the name of that list file. Also, order your replacement drive!
After file system checks have run successfully, you should be able to reboot into a working system which will have enough life to recover any files that weren't broken by bad sectors.