"Unable to find disk" problem with permanent delete showing

stuving

New Member
I have a variant on the "Unable to find disk" issue - it only arose because I have the "permanent delete" problem, and was trying to use the custom firmware to get rid of that. I can't find quite this case on the forum - apologies if I have missed it.

Details:
HDR-FOX-T2 never modified (so original 500GB disk) and running FHTCP 1.03.12 dated 7/2/14.
No CF ever loaded.
Two things I think are irrelevant, but for the record:
* from ca. 3 months ago, radio screen display behind the screensaver "wandering clock" flashes randomly when it should be blanked.
* in the last week some recordings were incomplete, so do not decode properly and almost hang the machine - I've seen this before, and believe it's a s/w issue not signal, and needs a power cycle to cure it.
Having powered off and on, and deleted a folder, I saw the star just kept spinning. It's been there ever since. I suspect it may have started doing that a day or so earlier, as some recordings were missing.

Now it will delete files and folders, and replay OK, and recordings can be set and appear to happen - but no new file appears at any time. Other than what I tell it, no extra files are being deleted, as far as I can see. The Media button works, but the Data Storage entry in the Menu/Settings/System menu is greyed out.

So I loaded the CF for this version, and it seems OK - I can use Telnet and start maintenance mode, but when I start fix-disk it tells me here is no disk. I see the intro page on the browser, but again it says there is no disk. Most things appear to work off the remote, but not the Media button (just a brief flash of the screen) or Data Storage function.

I loaded the standard 1.03.12 and I am now back to the fault state immediately before loading the CF, i.e. with access via the Media button to replay, but still "deleting" all the time.

Please, can someone suggest what my best bet is here? I don't think there is any evidence of disk failure, and I would like to keep the recordings on the disk .
 

af123

Administrator
Staff member
Try CFW version 3.02 as that may have better luck seeing the disk.
What model of disk it it?
 
OP
S

stuving

New Member
I've never looked in the box - unusually, for me.
It was bought in 2012, originally with version 1.02.18 dated 26/5/12, if that helps.
 
OP
S

stuving

New Member
OK, CFW mod_3.02 does see the disk, and delete is still running. Currently running long disk test for fix-disk, so we'll see what that says.
 

af123

Administrator
Staff member
Was that long test started by fix disk?
Usually it's best just to pick the fix disk option from the menu and let it do its thing (which includes running self tests if necessary).
 
OP
S

stuving

New Member
Yes - that's what I did. Then went out, hoping it would run for the predicted 3:40. But it didn't get that far before finding bad sectors; and quite a lot of them. Note that it stopped and asked [y/n] before every repair, which I didn't expect as the Wiki just says "Once the disk check and repair has completed, you will be returned to the Maintenance Mode Menu." Did I miss an option for prompts or not?

In fact, there were 180 sectors with errors - which sounds like bad news to me, though I don't understand what the test output is telling me.
All 180 were repaired and flagged with e.g. "re-writing sector 1924340533: succeeded".

Now it's chugging its way through every bad sector to see if it held data. I suspect not, as the sectors (LBA) run from 1460601704 to 192482935. (Yes, it is 1TB - not sure why I remembered it being the small one). So if the disk fills up from the bottom, and it says 434 GB free, having never been much higher, these bad sectors may be in the unused space. And indeed, so far all have been found unused. But at nearly 2 minutes per sector, it'll be the middle of the night before it's done.
 

Black Hole

May contain traces of nut
The bad sectors are not in the unused space. These are file system checks, and faults are only detected by analysing the consistency of the file system. To detect bad sectors in unused space would require write/read tests on each sector of the disk.
 

af123

Administrator
Staff member
They could be in unused space. This first phase consists of disk self tests performed by the disk firmware which has no idea which blocks are in use, or even what a file system is.

It's unusual to have more than a handful of bad sectors though, it will take time to check them all.

File system checks are the next phase.
 
OP
S

stuving

New Member
...

It's unusual to have more than a handful of bad sectors though, it will take time to check them all.

File system checks are the next phase.

That's now looking optimistic. Having spent about two hours checking 53 sectors that were unused, it then spent nearly five hours over eight that were all in a single programme file, seven of them close (spanning ten sectors). It's now been (so it says) looking for the owner of the same sector for at least six hours. And there's still another 118 to go after that ...

Oh dear! (Other more or less similar ejaculations are available.)

I guess I have to stop it now, but is there a more kosher way than the mains switch? And then what?
 
OP
S

stuving

New Member
It's now been (so it says) looking for the owner of the same sector for at least six hours. And there's still another 118 to go after that ...
Unfair of me. I'd forgotten that the laptop I was using does, eventually, hibernate if left, and that stopped the logging. So I don't know how the fix-disk run ended.

The Telnet session was dead, but it was happy enough to accept a new one. On exiting, and following the restart, the box appears to be in the same state - still showing as deleting. Now, do I re-run fix-disk, or try to copy the recordings onto USB first? Hmm...
 

af123

Administrator
Staff member
Sorry, I haven't been around today.

I would start by checking how many pending sectors are left to repair by running:

smartctl -A /dev/sda

from the command line and look for the lines which start with 197 and 198.

I've thought for some time that it would be good to have an option to skip the checks for which files are affected by the damaged sectors. For most people it's just a couple of sectors but for a case like yours with 180 it just takes too long.
 
OP
S

stuving

New Member
Thanks for that - though I'm not sure what I then do with that information.

In fact, I started another run of fix-disk, thinking it would not re-find the "re-written" sectors. If it does, that'll be a pain - is there a way of stopping it?

This time, the short disk self test found and re-wrote four bad sectors (none last time), all above any the long test found last time. The last run ended with "Skipped repair of LBA 1924829357" after I'd said "Y" to it, so I was wondering if it gave up looking when the count got to 180.

Final question - what's the most basic simple way of copying the recordings off this disk? I have enough space on a USB disk, which is NTFS, so can I just plug it in and use linux commands?
 

af123

Administrator
Staff member
So is fix-disk still running? With only four sectors to check it should be much quicker.

If the number of pending sectors (as shown by the smartctl command) is zero, then you could run fix-disk with the -P option to skip the sector check and just repair the filesystem.

You can copy recordings from the internal disk to a USB one with just the 'cp' command. Unfortunately there is nothing more powerful built into the root filesystem image.
 

prpr

Well-Known Member
It might be nice if maintenance mode could copy some of the more useful utilities off the disk (if it can access it) into /tmp before unmounting the partitions. Just a thought...
 

af123

Administrator
Staff member
It looks as if fix-disk supports the -B option to tell it to skip the block search pass (which is what seems to be taking the time for the OP).
 
OP
S

stuving

New Member
What's tedious about the bad sector search is that it stops to ask permission to repair for each one, and it can take minutes to get from an error to one at the next LBA. And I don't see why, as Ican't really imagine the answer is going to be different each time. Checking usage is just very slow, but can be left to itself; presumably it has to be done by searching each file in turn.

This time the first bad sector was the one (no. 181) it said it skipped before going on to check usage for the first 180. (That fits with my notion of how automatic re-vectoring works, unusually for anything to do with disk internals.) I had expected I would not see more that 180 this time either, but I think (without counting) it must be more than that already.

In terms of lost capacity, even a few thousand sectors is a tiny fraction of the total. So if the disk had a nasty experience when young, and the damaged sectors were never found before because that bit of the disk was never used, that huge number is not tragic. But if sectors are failing at a high rate, the bin beckons.

But while programme files may build up from low addresses in their partition, what is at the very top of the disk? Presumably that's partition sda3, but what is that used for?
 

af123

Administrator
Staff member
It's fairly small and used for buffering downloads from youtube and iplayer.
 

af123

Administrator
Staff member
What's tedious about the bad sector search is that it stops to ask permission to repair for each one, and it can take minutes to get from an error to one at the next LBA. And I don't see why, as Ican't really imagine the answer is going to be different each time.
Useful feedback - I am updating fix-disk for the next release so will add an option to allow for skipping the prompt in the case when there are lots of bad sectors.
You're right, the key thing with bad sectors is the rate of increase.
 
OP
S

stuving

New Member
What does stop fix-disk during its usage scan?

The first time, it took several hours finding 180 bad sectors, waiting for my input each time. It then said "Skipped repair of LBA__" and found 62 unused bad sectors, then took no more tan five hours (probably much less) finding the file ID and name of sectors in use. then the PC wnet to sleep and its Telnet client stopped recording.

After a few more hours, when I started Telnet again, I got the maintenance mode menu. But why did fix-disk stop? I've tried the CTRL+] code, at an earlier stage, and id did nothing.

The second run started similarly, thouhg the short test found four high-numbered bad sectros.but took longer finding bad sectors as it found 625! (Aaaargh!!) Note that this time a couple of the "re-writing sector ___" messages did not end with "succeeded". Then it said "Skipped repair of LBA__", and started its search with the four sectors found in the short test, before going on to the long list. It's now found all of twelve files in eight hours.

Clearly, I need to stop it. Presumably this file search is read-only, and there is no issue of interrupting a repair action, so even power off wouldn't harm anything (unless the disk is real bad and can't safely park its heads). But is that all?

One more scary factor is that the usage scan produced three messages of the form "icheck: Attempt to read block from filesystem resulted in short read while calling ext2fs_block_iterate", "icheck: Can't read next inode while doing inode scan", all from sectors found in the short test, above LBA 1927818688. (Both scans gave an "ncheck: Can't read next inode while doing inode scan" message on every file search.)
 
Top