Error when running fixdisk

Black Hole · Jan 18, 2020

Andrea Edwards said:
What 'messing' are you suggesting

Andrea Edwards said:
then using the Linux distro of your choice, use the erase all partitions option, you can also run some surface scanning software to search for bad blocks (sectors)

Modern drives do that stuff for themselves, which is where the SMART reallocation figures come from.

Code is a Hole · Jan 18, 2020

Black Hole said:
Correct, but nonetheless it has been available within the CF for many years and has solved many people's problems without having to remove the HDD

I was unaware of fixdisk, I was even unaware of Maintenance Mode accessible from Diag, when it was run, it took four days

MartinLiddle · Jan 18, 2020

Andrea Edwards said:
when it was run, it took four days

How long it will take to run is a function of how the disk is formatted initially and how damaged the file system is; I don't know of anyway of predicting in advance. I would guess 24 hours might be typical and for a drive formatted with a substantially reduced number of inodes (compared to the standard Humax format) it can be as quick as 30 minutes.

Code is a Hole · Jan 18, 2020

Black Hole said:
but now all the controller stuff is on-board the HDD itself, with complex mapping schemes calibrated at factory to map out defective blocks, it isn't recommended to go messing with that (even if such facilities are accessible)

So are you saying that modern hard drives no longer fail due to having the electronics built on to the hard drive,

I use Linux to remove all traces of partitions used by a previous OS, I then certify the drive as being fit for purpose prior to installing, if it is being installed in a mission critical environment, running a surface test prior to installing is far less risky than trying to recover potentially losable recordings

Code is a Hole · Jan 18, 2020

MartinLiddle said:
How long it will take to run is a function of how the disk is formatted initially and how damaged the file system is; I don't know of anyway of predicting in advance. I would guess 24 hours might be typical and for a drive formatted with a substantially reduced number of inodes (compared to the standard Humax format) it can be as quick as 30 minutes.

When the Humax started to indicate that there were hard drive issues, I used a cloning box (a device with two SATA bays and a clone button) to duplicate one drive on to the second, the cloned drive was then put back in to the Humax Fox-T2, as each failed drive was then cloned, it is very likely that over time the inode errors were becoming cumulative, hence it took such a long time for fixdisk to complete, running fixdisk when a crash is indicated now takes 1hr 20mins

Trev · Jan 18, 2020

I'm not 100% convinced that a DVR is 'mission critical'. It's only telly! :frantic:

Code is a Hole · Jan 18, 2020

Trev said:
I'm not 100% convinced that a DVR is 'mission critical'. It's only telly!

I appreciate your view, but mission critical is a term to be used by the user, rather than by the onlooker, you may deem a DVR is just telly, but I prefer to watch all of the programme and all the programmes in the series, so have just plugged my DVR in to a UPS, I got fed up missing recordings due to power outages, don't desire to miss recordings due to disk errors, please don't judge

Testing hard drives before use is second nature, a process that has been implemented for many years, it just happens the tested hard drive ended up in the Humax Fox-T2

Trev · Jan 18, 2020

Fair enough.

EEPhil · Jan 19, 2020

Andrea Edwards said:
So are you saying that modern hard drives no longer fail due to having the electronics built on to the hard drive,

I doubt that is what BH meant. Modern drives do fail, but bad block recovery is usually automatic - up to a point. So, surface scan is usually not needed. (I've been known to do it on my computer, but get fed up because it takes such a long time!)

Trev said:
It's only telly!

That is a running joke on these forums.

Andrea Edwards said:
please don't judge

If only

. (As in, if only I had a UPS and could prevent my Humaxes from bombing, I would.

)

Black Hole · Jan 19, 2020

The problem is this: the drive has had a very detailed surface scan at time of manufacture, and hard defects have been mapped out before it gets to the user. All the user can do now is view logical block addresses without any real knowledge of where those blocks are actually located on the disk itself, and that view won't include any blocks which were mapped out before delivery. Then the SMART stuff monitors for subsequent read errors, and if it decides those errors are "hard" maps them out as well (reallocated sectors) until it runs out of spare blocks.

In the old days of "bare" drives and separate controller cards, you could get a raw unmapped view of the drive surface, and in any case hard errors had to be accounted for in the formatting by marking sectors as "bad" for the operating system to avoid. None of that is now the case.

Everything you need to keep a disk as healthy as possible is built into the modern HDD, and mostly runs automatically in the background. I regard external tools purporting to interfere with that process as unnecessary at best, and potentially damaging. Am I wrong?

MartinLiddle · Jan 19, 2020

Black Hole said:
Everything you need to keep a disk as healthy as possible is built into the modern HDD, and mostly runs automatically in the background. I regard external tools purporting to interfere with that process as unnecessary at best, and potentially damaging. Am I wrong?

If it is true why do we see Offline_uncorrectable_sectors that need software intervention to map out?

Black Hole · Jan 20, 2020

Black Hole said:
mostly

And note they were detected automatically without need for external services.

Something I am not clear about: the "software intervention to map out" - I believe this intervention is mediated by firmware resident on the HDD controller itself (ie the "long test")?

/df · Jan 20, 2020

MartinLiddle said:
If it is true why do we see Offline_uncorrectable_sectors that need software intervention to map out?

First of all, the well-known SMART attributes are only conventional and don't follow any documented interface specification. Any attribute may be differently interpreted and implemented by each drive firmware, even from the same vendor. Even the format of the vendor-specific data in which they are returned has not been standardised since 1998.

With that caveat, consider what happens when a potentially bad sector is found.

If it's a write, the bad sector is going to be trashed anyway, so it can be remapped directly. This is how hdparm --repair-sector forces a remap, by writing zeroes to the sector.

If it's a read, the firmware can retry several times with these possible results:

the sector data is returned with a valid CRC, in which case the firmware can either deem it good or remap it, or
the read never succeeds, and it becomes a "Pending" sector; maybe a subsequent attempt to read the sector will have better luck, or it will be overwritten, as above.

A read may be triggered through the OS, or by the internal "Offline" testing function of the firmware. Hence, perhaps, the use of "Offline_uncorrectable_sectors". The disk firmware can't generally know whether the sector being read contains valid current data. In fact, if the read comes through the OS, the only safe assumption is that it does, or it wouldn't be being read. The firmware can only remap an unreadable sector under the conditions above.

Hence fixdisk spends a lot of effort working out if a sector reported bad by SMART testing belongs to an actual file or directory.

All this depends on the pre-allocated spare sectors not having been exhausted. Once that happens, a bad sector that can't be remapped becomes a permanently bad sector that the OS has to work around, as if it was still 1990. But such a disk would have been so unreliable by that point that only extreme circumstances would have prevented its replacement.

As a security note, if a remapped sector happened to contain a temporary unencrypted copy of somethinng like your password file or world domination plans, the "right" tools could potentially recover that data from the original sector even after wiping the disk at the OS level.

af123 · Jan 20, 2020

/df said:
If it's a write, the bad sector is going to be trashed anyway, so it can be remapped directly. This is how hdparm --repair-sector forces a remap, by writing zeroes to the sector.

Most firmwares will try the write and if it succeeds then it just clears the pending flag on the sector, otherwise it gets remapped.

Error when running fixdisk

Black Hole

May contain traces of nut

Code is a Hole

Member

MartinLiddle

Super Moderator

Code is a Hole

Member

Code is a Hole

Member

Trev

The Dumb One

Code is a Hole

Member

Trev

The Dumb One

EEPhil

Number 28

Black Hole

May contain traces of nut

MartinLiddle

Super Moderator

Black Hole

May contain traces of nut

/df

Well-Known Member

af123

Administrator