SOLVED: The CF causes processor overload UNLESS fixdisk used to optimise hard drive again

mihaid · Feb 21, 2020

no breakup on the live prog or the first recording

mihaid · Feb 21, 2020

nor the 2nd recording

1	Raw_Read_Error_Rate	POSR--	239264634	120	079	006		-
3	Spin_Up_Time	PO----	0	097	097	000		-
4	Start_Stop_Count	-O--CK	22854	078	078	020	73%	-
5	Reallocated_Sector_Ct	PO--CK	0	100	100	036	100%	-
7	Seek_Error_Rate	POSR--	1534387878	091	060	030		-
9	Power_On_Hours	-O--CK	43396	051	051	000	51%	-
10	Spin_Retry_Count	PO--C-	0	100	100	097	100%	-
12	Power_Cycle_Count	-O--CK	11427	089	089	020	87%	-
184	End-to-End_Error	-O--CK	0	100	100	099		-
187	Reported_Uncorrect	-O--CK	14774	001	001	000		-
188	Command_Timeout	-O--CK	0	100	100	000		-
189	High_Fly_Writes	-O-RCK	0	100	100	000		-
190	Airflow_Temperature_Cel	-O---K	54	046 (54°C)	043 (57°C)	045 (55°C)		In_the_past
194	Temperature_Celsius	-O---K	54	054	057	000		-
195	Hardware_ECC_Recovered	-O-RC-	239264634	044	033	000		-
197	Current_Pending_Sector	-O--C-	0	100	100	000		-
198	Offline_Uncorrectable	----C-	0	100	100	000		-
199	UDMA_CRC_Error_Count	-OSRCK	0	200	200	000		-

prpr · Feb 21, 2020

I don't like that "Reported_Uncorrect" figure. I suppose the cause of the higher-than-desirable temperature figure is due to you running in Safe mode without the Fan package with a decent minimum speed set.

mihaid · Feb 21, 2020

prpr said:
I don't like that "Reported_Uncorrect" figure.

what does that mean?

/df · Feb 21, 2020

There's a discussion of the interpretation of "Reported_Uncorrectable_Errors" (SMART parameter 187) here. tl;dr: any value > 0 is a Bad Thing.

It would be reassuring to see that "Hardware_ECC_Recovered" == "Raw_Read_Error_Rate", if I didn't suspect that the firmware is just reporting the same data in both fields.

mihaid · Feb 21, 2020

those guys have new drives that fail within 70 power cycles. how is that relevant to my 22k cycles?

there was an instance in the past when a sector had failed but af123 helped me sort it out about 3-5 years ago whether that has any bearing on this matter

/df · Feb 21, 2020

BackBlaze had (2014) many thousands of drives. Some fail on burn-in. As the blog points out, their usage doesn't involve regularly turning drives off and on, so their data for power cycling isn't a good match for your HDR.

The ones that survive, like your disk, turn out to be likely to fail once SMART 187 > 0. In fact if 10 drives have values in the range 65-120 their stats say that between 1 and 3 will probably fail in the next year. Your disk's value is some 1500 times greater, way off the scale of the graphs presented by BackBlaze.

According to your stats, the firmware marked the bad sector as good without relocating it. It (and perhaps its neighbours) may be responsible for the large number of "Reported_Uncorrectable_Errors".

mihaid · Feb 21, 2020

so shall I get a new drive?

in many threads I read all the talk was about smart 5 and 197-199

MartinLiddle · Feb 21, 2020

mihaid said:
so shall I get a new drive?

I would suggest keeping an eye on attribute 187 and see how rapidly it is increasing; then we can make a better judgement.

MymsMan · Feb 21, 2020

Speculative hypothesis:

The Reported Uncorrect figures could refer to a single bad sector that is being frequently referenced, but for some reason the disk hardware is unable to reallocate the sector.

Since pixellation occurs at ten minute intervals while the CF is running (but auto turned off) but not during recording with CF turned off the faulty sector could be in one of the databases that the CF accesses every ten minutes.

If you were to rename the affected database that portion of the disk would no longer be accessed and hopefully the problems would cease, (missing databases are normally recreated on next use though the reservation database (rsv.db) would need to be restored from a backup of the recording schedule)

AFAIK the two processes that run every 10 minute with auto turned off are the EPG update using /mnt/hd1/dvbepg/epg.dat sqlitedumpd /mnt/hd1/epg.db and rs_processes using /var/lib/humaxtv/rsv.db, I am not what else they might access

BTW my own disk has an Reported Uncorrect figure of 2052 so I will have to keep an eye on it.

/df · Feb 21, 2020

MymsMan said:
Speculative hypothesis:

The Reported Uncorrect figures could refer to a single bad sector that is being frequently referenced, but for some reason the disk hardware is unable to reallocate the sector.
...

Or just kept deciding it wasn't necessary. The stats indicate that this is a one in 20 thing at worst, if the box has been CFed from the start, or worse in inverse proportion to the fraction of uptime using CF.

hdparm --fibmap filename shows which disk sectors are used by the file "filename". A fix-file script could be envisaged that would test the sectors listed for the file using a selective SMART test. Any sectors that were marginal but never remapped could be added to a bad-block file to be fed to e2fsck -l. This would be less intrusive than running fix-disk or a full OS level disk check, eg e2fsck -cc.

mihaid · Feb 21, 2020

MartinLiddle said:
I would suggest keeping an eye on attribute 187 and see how rapidly it is increasing; then we can make a better judgement.

hey, good news (or bad) I just found out that:

attribute 187 three years ago was

14773

mihaid · Feb 21, 2020

and seven years ago

13476

mihaid · Feb 22, 2020

Can I allow myself a sigh of relief?

MartinLiddle · Feb 22, 2020

mihaid said:
Can I allow myself a sigh of relief?

I think the evidence suggests your hard drive does not need immediate replacement; however we have seen several cases of drives with satisfactory SMART data that are actually faulty.

MymsMan · Feb 22, 2020

mihaid said:
Can I allow myself a sigh of relief?

While it may not be dying imminently you still have the ongoing problem that the disk is not performing well enough to allow you to run the normal basic CF operations without getting pixellation of recordings, If you are prepared to live without the CF facilities that is your choice but if you want to use CF then you need to improve the disks performance or replace it.

You could try my earlier suggestions, try running fix disk to see if it finds anything or find another disk to see if performs better

mihaid · Feb 23, 2020

viewed quite a few recordings recently and no pic breaks at all.

is there a possibility that fix disk might make things worse?

prpr · Feb 23, 2020

mihaid said:
is there a possibility that fix disk might make things worse?

No. If there were, it would be called break-disk.

mihaid · Feb 23, 2020

can you interrupt a fix disk session if it's too long?

Trev · Feb 24, 2020

How would you define "too long"?

SOLVED: The CF causes processor overload UNLESS fixdisk used to optimise hard drive again

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Super Moderator

Ad detector

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Super Moderator

Ad detector

Well-Known Member

Well-Known Member

Well-Known Member

The Dumb One