Disk failure...

af123 · Aug 3, 2012

Green Armchair said:
Did you try a reformat?

I will try a filesystem check once I've finished copying off as much as I can. Given that the drive itself is reporting read errors (now up to ATA Error Count: 16308) I don't think it's recoverable. I'll try a reformat as well - can't hurt at that point.

I'm not even sure how usable the copied-off content will be.

af123 · Aug 3, 2012

xyz321 said:
195 & 197 may also be useful. Syslogging would help to target the SMART tests at a particular range of disk sectors.

Yes, 197 is a good indicator of problems:

Code:

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1

Ezra Pound · Aug 3, 2012

Some interesting stuff here http://hebb.mit.edu/people/jfmurray/publications/Murray2003.pdf and here http://www.cropel.com/library/smart-attribute-list.aspx on SMART interpretationshttp://www.cropel.com/library/smart-attribute-list.aspx

af123 · Aug 3, 2012

I ran some more SMART diagnostics and selftests to see the scale of the problem - interesting stuff - with many thanks to http://smartmontools.sourceforge.net/badblockhowto.html

(my drive is sdb at the moment as I last booted with a USB disk attached)

Code:

humax# smartctl --test=short /dev/sdb
...
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
 
humax# smartctl -l selftest /dev/sdb
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      4640         496774356
 
humax# fdisk -lu /dev/sdb
   Device Boot      Start         End      Blocks  Id System
/dev/sdb1               2     2104514     1052256+ 83 Linux
/dev/sdb2         2104515  1932539174   965217330  83 Linux
/dev/sdb3      1932539175  1953520064    10490445  83 Linux

So the first problem block is within partition 2 - calculate the filesystem block:
fsblock = (int)((<problem LBA>-<partition start LBA>)*512/<fs block size>

Code:

humax# /mod/sbin/tune2fs -l /dev/sdb2 | grep Block\ si
Block size:               4096

humax# dc
16 o 496774356 2104515 - 512 * 4096 / p
3af8202

So my bad block is within filesystem block hex 3af8202 (61833730)
If debugfs was working (it crashes) I could find out which file this block is in but I'll just write zeros to it which should force the disk firmware to reallocate the sector. I'll break the file that is using that block but nothing else should be affected.

Code:

humax# smartctl -A /dev/sdb
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       1
 
humax# dd if=/dev/zero of=/dev/sdb2 bs=4096 count=1 seek=61833730
1+0 records in
1+0 records out
humax# sync
 
humax# smartctl -A /dev/sdb
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0

The reallocated sector count hasn't changed, but the other two attributes have gone to zero, which is a good sign!

Now, try another disk test (btw, it seems that the disk test performed from the Humax menus is a SMART short-offline self test too):

Code:

humax# smartctl -t short /dev/sdb
... wait a couple of minutes ...
humax# smartctl -l selftest /dev/sdb
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4640         -
# 2  Short offline       Completed: read failure       90%      4640         496774356

Look Ma, no errors!
Next I'll drop to maintenance mode and run a full disk check.
It's odd that a single faulty disk block would break playback of all recordings though.. more investigation required and I'm still swapping the disk, I'm just more confident of getting my content back now. At the end of the day it's only TV though and it will get rid of all of those films we were never going to get around to watching!

Kev_w · Aug 3, 2012

af123 said:
It's odd that a single faulty disk block would break playback of all recordings though.. more investigation required and I'm still swapping the disk, I'm just more confident of getting my content back now.

You're not kidding anyone. You loved it that your disk broke and gave you a reason to upgrade to 2TB - I'd be the same!

Can you play back the recordings over the network? Like you say - strange that one bad block would break playback for all recordings unless its currupted something the box requires to actually play the files.

p.s. I dont mind taking your 1TB hdd off you after you've fixed it - wouldn't mind an upgrade myself.

af123 · Aug 3, 2012

Kev_w said:
Can you play back the recordings over the network? Like you say - strange that one bad block would break playback for all recordings unless its currupted something the box requires to actually play the files.

Yes - to my HD box, but they suffer the same problems in that the playback keeps freezing for 5-10 seconds then resuming.
I'll get debugfs fixed if I can and work out which file was broken. In the meantime, I'm running an extended disk check (smartctl -t long /dev/sdb) to see if it finds any more errors. This, from the smartmon FAQ page seems to explain what I saw (and why the reallocated sector count is still zero - presumably the disk now has confidence in that sector having written to it).

If the disk can read the sector of data a single time, and the damage is permanent, not transient, then the disk firmware will mark the sector as 'bad' and allocate a spare sector to replace it. But if the disk can't read the sector even once, then it won't reallocate the sector, in hopes of being able, at some time in the future, to read the data from it. A write to an unreadable (corrupted) sector will fix the problem. If the damage is transient, then new consistent data will be written to the sector. If the damage is permanent, then the write will force sector reallocation.

As long as no further errors are found, it seems that reformat would probably have solved my immediate problem. I would, of course, have lost everything.

Interestingly, all of the .ts files I copied off yesterday with rsync now have a different checksum following the block fix, so whatever's going on it was affecting all large reads - the .hmt and .thm files seem ok.

Ezra Pound · Aug 3, 2012

your last block of {code} in #24 was confusing at first, I presume it shows a history of past tests, with the previous 'failed' test at #2 and a new passed test at #1

LDW · Aug 3, 2012

Kev_w said:
"...has the display on your geeky box thingy always been like that? It's distracting."

I do have to admit that the scrolling message during playback does keep catching my eye now it's brighter.

It sounds like everyone except me has their box near their telly. I've got mine on a hifi rack behind my viewing chair - and a 10 metre HDMI cable. Am I odd?

Black Hole · Aug 3, 2012

Yes

(you did ask)

af123 · Aug 3, 2012

Well, here's the file with the bad block. I suppose it would be being written to fairly regularly (at least the file although not the block)

Code:

humax# /mod/sbin/debugfs
debugfs 1.41.14 (22-Dec-2010)
debugfs:  open /dev/sda2
debugfs:  testb 61833730
Block 61833730 marked in use
debugfs:  icheck 61833730
Block    Inode number
61833730    15310949
debugfs:  ncheck 15310949
Inode    Pathname
15310949    /mod/monitor/monitor.db

Ezra Pound · Aug 3, 2012

I wonder if there is a linux utility to prevent heavily used files e.g. lots or read/write cycles being written to the same physical place on a hard disk

xyz321 · Aug 4, 2012

It should be possible to mark bad blocks with e2fsck and the badblocks program. Alternatively the disk manufacturer's disk test utility may be able to force the bad sectors to be relocated.

Disk failure...

af123

Administrator

af123

Administrator

Ezra Pound

Well-Known Member

af123

Administrator

Kev_w

Member

af123

Administrator

Ezra Pound

Well-Known Member

LDW

Member

Attachments

Black Hole

May contain traces of nut

af123

Administrator

Ezra Pound

Well-Known Member

xyz321

Well-Known Member