"Unable to find disk" problem with permanent delete showing

prpr · Mar 7, 2016

stuving said:
But why did fix-disk stop?

Because processes running under a terminal session get killed when the session disconnects. The session disconnected when the PC went to sleep.
So don't let the PC go to sleep!

I've tried the CTRL+] code, at an earlier stage, and id did nothing.

Ctrl+] followed by what? All that does is get you back to the Telnet prompt. Why did you do that, what did you do after it, and what did you hope to achieve?

stuving · Mar 7, 2016

I did stop it - again ctrl+] did nothing, but closing the telnet session and starting a new one worked. Of course the "deleting" was still there.

So I've been trying to copy my recordings off the HDR to my USB disk, as I can't see any way to avoid at least a reformat, and probably a new hard disk. However ...

*If I plug it in with maintenance mode running, it doesn't mount (which I think USB devices are meant to), nor appear as an unmounted device so I can't mount it manually either.
*With humaxtv running, it mounts ro. That's expected, for an NTFS disk.
*If I then enter maintenance mode, I can umount and mount it again but it's still ro.

Now I thought full NTFS support was part of the basic custom firmware - but what I read here isn't at all clear. For one thing there is a distinction made sometimes between firmware (from USB) and software (downloaded), but mostly it's all called firmware. There's a package list, but some of the items on it are obviously on the USB load - e.g. telnet and fix-disk. Nothing on the list says "this is part of the initial USB package", "this is always downloaded with the web i/f", or "this is only loaded if you select it". What's downloaded is referred to as the full web i/f, but that does include some packages (if only the ones for the i/f itself).

In any case, I can't download web-if or packages - it says "read-only file system" if I try, and indeed the mount command does show /dev/sda2 mounted as /mnt/hd2 and ro. (Is /hd2 where the firm-ishware goes?) I guess this is how this "permanently deleting" state comes about, but the older one (which deletes all your files) must be something different.

So, currently I'm at catch-22. I should add that never was a linuxophone, so I'm having to find commands one by one to do this, which adds to the ... fun?

af123 · Mar 7, 2016

Did you ever run that smartctl command to see how many sectors are marked bad by the disk firmware? Once we know that, we can advise on the best way to try and fix it, which may involve sending you a custom firmware build that doesn't do as much prompting.

Regardless, I would try running fix-disk again but with the -P option. That will repair the filesystem if possible but won't try and fix any more bad disk blocks. That will likely get you up and running well enough to copy things off.

The Humax firmware only supports read-only NTFS. There is a custom firmware package (ntfs-3g) which adds read/write but that needs to be installed on disk via a package download.

stuving · Mar 7, 2016

I didn't run smartctl before - but this is what it says:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 102 095 006 Pre-fail Always - 158449151
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 095 095 020 Old_age Always - 5438
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 084 060 030 Pre-fail Always - 307446620
9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 18701
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2720
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 914
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 90195689493
189 High_Fly_Writes 0x003a 001 001 000 Old_age Always - 1645
190 Airflow_Temperature_Cel 0x0022 051 042 045 Old_age Always In_the_past 49 (1 225 51 49 0)
194 Temperature_Celsius 0x0022 049 058 000 Old_age Always - 49 (0 19 0 0 0)
195 Hardware_ECC_Recovered 0x001a 048 032 000 Old_age Always - 158449151
197 Current_Pending_Sector 0x0012 097 096 000 Old_age Always - 156
198 Offline_Uncorrectable 0x0010 097 096 000 Old_age Offline - 156
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

I don't know what, specifically, the numbers mean - but did you mean option -a? that gives loads more stuff, in which it says inter much alia:
"SMART overall-health self-assessment test result: PASSED".

What other bits are worth examining?

prpr · Mar 7, 2016

187, 188, 189, 197, 198 all give cause for concern. I would backup what you want to keep and buy a new disk.

af123 · Mar 7, 2016

That shows (attribute 198) that there are 156 sectors left with faults rather than the 180 you started with, which is at least an improvement. There are no reallocated sectors either so they could just be due to a firmware glitch and recoverable. A filesystem fix-disk should at least get things working again or you could just use FTP to pull the files off over the network and replace. Reformatting the drive will not fix this properly, those blocks need specifically rewriting.

4ndy · Mar 7, 2016

Nearly 19,000 hours power on! Is that a record?

Sent from my iPad using Tapatalk

MymsMan · Mar 7, 2016

4ndy said:
Nearly 19,000 hours power on! Is that a record?

That is only around 2 years of continuous operation and thee box has been out since 2010

stuving · Mar 7, 2016

MymsMan said:
That is only around 2 years of continuous operation and thee box has been out since 2010

Indeed - bought September 2012, so 41 months * 17 hours per day * 325 days per year hits that figure pretty exactly. I turn it off at night, but otherwise it runs all day if I'm not away (and I retired in 2012).

prpr · Mar 7, 2016

4ndy said:
Nearly 19,000 hours power on! Is that a record?

It's cumulative, not continuous!

4ndy · Mar 7, 2016

I know it's cumulative. Both disks i have replaced died long before that.

Sent from my GT-I9505 using Tapatalk

Black Hole · Mar 7, 2016

That's nothing! My original HDR is still on its original 500GB disk and registers 33145 hours. The same goes for the other two, but 22687 and 14980 hours respectively.

prpr · Mar 8, 2016

4ndy said:
I know it's cumulative. Both disks i have replaced died long before that.

I'd say you'd been rather unlucky then.
I've still got a few disks at work which have at least 90000 hours on them. A couple may even be over 100000 by now. I firmly believe turning them on and off constantly is bad for them. These run 24/7... and yes, the machines are being replaced in the not too distant future.

My PATA drives here got to about 10 years old before I replaced them with SATA a couple of years ago and were still in perfect working order (maybe 1 sector error between all 5 disks). I've already had one of the new ones die with ever increasing amounts of bad sectors (and it was less than 2 years old, although Seagate's warranty checker said otherwise, which kinda annoyed me).

af123 · Mar 8, 2016

stuving said:
I didn't run smartctl before - but this is what it says:

If you are interested, I can send you a beta version of the next custom firmware which has an updated fix-disk utility which can told to skip the time consuming block tests and assume yes to all questions.

stuving · Mar 8, 2016

af123 said:
If you are interested, I can send you a beta version of the next custom firmware which has an updated fix-disk utility which can told to skip the time consuming block tests and assume yes to all questions.

Currently I've got another couple of hours of FTP transfer to do. At least FTPing has a lot less hassle than I was expecting, based on past experience, but at 100 Mb/s max it's not ideal. I was then going to do as you suggested, and use fix-disk -P to rewrite all the flagged sectors. So that beta version would be ideal for that. Are there other options as well?

Are we to take that figure of zero reallocated sectors literally? By which I understand that fix-disk led to the disk controller doing something like (1) put good, or best-guess recovered, data somewhere (2) write to sector (3) validate by reading, and find no errors (4) restore saved data. If the read check failed it would have substituted a spare sector, but this never happened in all the sectors rewritten so far. Given the disk says it had 914 errors, and two runs of fix-disk have rewritten 180 and 625 of them, there should be 109 left - not 156. But there may be things in some counts that are not what I think, of course.And two of those rewrites didn't say "succeded" as well - so what state are they left in?

However you look at it, such a huge number of sectors that once read with error and now don't, all apparently in one or two files, suggests to me something wrong in the write process - more likely low-level (e.g. a current or voltage) than data handling. Which would be a disk fault in any case, and mean scrapping it.

stuving · Mar 9, 2016

Running fix-disk -P has got rid of the "deleting...." problem. As you said, I still have the flagged bad sectors, and smartctl now reports a total of 960 errors (up from 914) and 152 pending (down from 156). I think I'd prefer to process (rewrite) those and then see how many come up in the future. I gather the figure for high-fly writes (1645) is very high, and while they should be detected and not write unreadable data, it is suggestive of an error mechanism.

My initial problem was that CF mod_3.03 didn't see the hard disk. Is that an incompatibility with my box, or just with state it had got into? I may as well move on to mod_3.03 if it's likely to be OK. Incidentally, I can't find any words to tell me whether the CF has to be loaded after the relevant standard update, or whether it includes a copy of that plus the custom bits. The use of both "custom" and "customised" firmware (or software) to describe it provides evidence for both.

af123 · Mar 9, 2016

3.03 is not likely to see your hard disk. It has a different SATA driver to 3.02 which solved disk issues for some people and introduced it for others.
With CFW 3.xx, you don't need to load the standard firmware first (with CFW2.xx you did have to which is probably the source of the confusion - there may be old documentation around).

I'll send you a beta version of 3.10 which should give you a way of repairing those sectors. They were weakly written for some reason but it does not (at the moment) appear to be due to physical defects on the disk so it could just have been a transient problem... let's see what happens!

stuving · Mar 13, 2016

Well, so far so ... I'm not sure.

fix-disk, run with the new options, was left running for ages; after which is has reduced the current_pending_sector count from 152 to 24. But in doing that it found (and says it successfully re-wrote) 9,411 bad sectors. Yup, that many. In the end I stopped it, so I could see if the box was still working (and use it) - seems it is. So there are presumably more to find if I run it again.

Now, I know the raw SMART data are manufacturer's secrets, and they aren't letting on what they really mean (not Seagate, at least). And I can't find any straightforward description of how these modern disks work and are tested. But as far as I can make out:

This long disk test, in its scan, reads every sector in LBA order. So the data it reads is what the box wrote in a file, or - for sectors it's never used - it's data written during formatting. What kind of formatting did that (i.e. write every sector) I don't know.

The Reported_Uncorrect count hasn't budged, from which I conclude that this only shows in-service read errors, not ones during this self-test. Raw_Read_Error_Rate and Hardware_ECC_Recovered are equal; allegedly that means no unrecoverable errors - I don't understand that.

Reallocated_Sector_Ct is till zero, confirming that all these sectors that read as "bad" on test then read OK after being written. From earlier tests it does look as if the errors are in unused space, with files just recently reaching the lowest LBA. No rewritten bad sector has been found as bad a second time.

So something has made a lot of the upper (high LBA, or inner?) part of the disk report errors, in areas never written to since the disk was new. But there are also errors in files written within the last two weeks. That's a puzzle, if the problem is in the write process.

Back in post #1, I said "from ca. 3 months ago, radio screen display behind the screensaver "wandering clock" flashes randomly when it should be blanked." I wonder if that might be related after all. For example, if a power supply voltage was out of spec., or not regulated, what might that do to the disk? Are they easy to find and look at inside?

With over 10,000 LBAs for bad sectors, there is scope for a lot of analysis. But only in LBA space - is there any way of relating that to physical position on the disks? I know the head/track/sector numbers are pure fiction, and the manual doesn't offer anything more real. There are other parameters there that allow estimates of (for example) sectors per turn, but the answer does not make any sense of the pattern of errors.

stuving · Jun 21, 2016

Well, after partly fixing the disk errors I was busy with other things, and have only spent a little time on this problem. I've checked the pending error count, which shot up from 24 to 271 after a couple of weeks. However, since then it has only gone up by four more - but obviously 275 is a worrying number and calls for precautions. So I did decrypt the files and copy them all (via FTP, as slowly as that demands), and since then have been trying to set up an incremental copy to a backup. That had some success and some issues, which I was going to post about, but then it waits until a couple of day before I go away and does this:

On-screen, I now see all the folders I should, with the right titles, counts of recordings, etc., but no recordings themselves whether inside the (series) folders or in the root. I have two created folders, one shows as empty (it should have some folders in it) and the other as containing (series) folders that are themselves empty. However, the contents are still visible in the web interface or via FTP from Windows Explorer (but with exceptions; see below).

I did run fixdisk again, though only for about an hour as I don't have time to let it run through. It found (and corrected OK) one error quickly, then no more in that hour. But that didn't affect this issue, so I've now set an FTP copy going of whatever is dated since the last one. That's probably all I can do just now.

Is this a known problem? My first thought was a directory issue, but it only affects the Humax process - or almost so. The exceptions I can see are the two programmes recorded today. In the web I/F both of these are also missing, but the folder shows an unviewed count that's correct for them being there. I suspect a few other older recordings may have also gone, though.

stuving · Jun 21, 2016

Oh, I was forgetting. Since yesterday morning it has been showing a message saying the HDD needs formatting to allow recording. But yesterday it was still recording and deleting, while today it was deleting but does not now appear to have recorded anything.

"Unable to find disk" problem with permanent delete showing

Well-Known Member

New Member

Administrator

New Member

Well-Known Member

Administrator

Member

Ad detector

New Member

Well-Known Member

Member

May contain traces of nut

Well-Known Member

Administrator

New Member

New Member

Administrator

New Member

New Member

New Member