Hard drive failure?

rob4x4 · Oct 25, 2012

Disk Partition

humax# fdisk -lu /dev/sda

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/sda1 8 2104510 1052251+ 83 Linux
Partition 1 does not end on cylinder boundary
/dev/sda2 2104512 955787166 476841327+ 83 Linux
/dev/sda3 955787168 976768062 10490447+ 83 Linux

Block Size command does not work. Is there a command missing at the start?
humax# /mod/sbin/tune2fs -l /dev/sda2
-/bin/sh: /mod/sbin/tune2fs: not found

prpr · Oct 25, 2012

Just /sbin/tune2fs without the /mod at the front...

"find -name tune2fs" is your friend in situations like this.

EDIT: actually you don't need /sbin in this instance either as that directory is on the search PATH.

rob4x4 · Oct 25, 2012

Ah I am learning, slowly....

humax# /sbin/tune2fs -l /dev/sda2
tune2fs 1.41.14 (22-Dec-2010)
Filesystem volume name: hmx_int_stor
Last mounted on: <not available>
Filesystem UUID: a1157a5c-31cf-4391-8afc-15fac5455ca6
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 29860704
Block count: 119209984
Reserved block count: 5960516
Free blocks: 86340236
Free inodes: 29854786
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 995
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8208
Inode blocks per group: 513
Filesystem created: Sun Oct 21 11:23:56 2012
Last mount time: Thu Oct 25 08:14:48 2012
Last write time: Thu Oct 25 08:14:48 2012
Mount count: 6
Maximum mount count: 33
Last checked: Wed Oct 24 17:07:14 2012
Check interval: 15552000 (6 months)
Next check after: Mon Apr 22 17:07:14 2013
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: 395fd507-c590-47cf-8355-5a012719c372
Journal backup: inode blocks
humax#

af123 · Oct 25, 2012

Ok, the filesystem block of the bad sector is

fsblock = (int)((<problem LBA>-<partition start LBA>)*<sector size>/<fs block size>

fsblock = ((14435421 - 2104512) * 512)/4096

Code:

humax# dc
16 o 14435421 2104512 - 512 * 4096 / p
1784F3

That's block: 1541363

So, fire up debugfs

Code:

humax# debugfs
debugfs 1.41.14 (22-Dec-2010)
debugfs:  open /dev/sda2
debugfs:  testb 1541363
Block 1541363 marked in use
debugfs:  icheck 1541363
Block    Inode number
1541363  1234
debugfs:  ncheck 1234
Inode    Pathname
1234    /xxx

so, you type:

open /dev/sda2
testb 1541363

then

icheck 1541363

which will give you an inode number then you do:

ncheck <inode number>

to get the path to the file which is located on the problem sector.

rob4x4 · Oct 25, 2012

Sorry, am I doing something wrong.... I get Block 1541363 not in use then dbugfs seems to hang. Did I need to do the Humax# dc command at the top first?

humax# debugfs
debugfs 1.41.14 (22-Dec-2010)
debugfs: open /dev/sda2
debugfs: testb 1541363
Block 1541363 not in use
debugfs: icheck 1541363

I'll have to pick this up later tonight. Got to go out now. Thanks for your help.

af123 · Oct 25, 2012

That's good news - the bad sector is not within a file so you can just write zeros to that sector to cause the drive to re-evaluate it and potentially reallocate:

Code:

humax# dd if=/dev/zero of=/dev/sda2 bs=4096 count=1 seek=1541363

Then look at the diskattr output again - the offline uncorrectable and pending sector counts should have reduced, reallocated sectors may have increased by one.

Now rinse and repeat for the other four bad sectors (start with another long disk test).

rob4x4 · Oct 26, 2012

Thanks. That worked so far. For the first secotr. Offline uncorrectable and pending sector counts should have reduced, reallocated sectors have increased by one.

Running long disk test again and will try the next sector.

rob4x4 · Oct 26, 2012

Well I've cleared two of the bad sectors but now the extended offline disk check seems to hang at 90% remaining and not complete so it wont give me the next "LBA_of_first_error" number. I'll leave it running for a bit longer and keep my fingers crossed but its hung twice now.

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 1860 -
# 2 Extended offline Aborted by host 90% 1860 -
# 3 Extended offline Aborted by host 90% 1860 -
# 4 Extended offline Completed: read failure 90% 1858 14436956
# 5 Extended offline Completed: read failure 90% 1853 14435421
# 6 Short offline Completed without error 00% 1852 -

af123 · Oct 26, 2012

I'd give it as long as you can, the long test does take a while and can appear to hang. The fact that you were able to abort it indicates it was probably still running. You may still end up having to send this one back under warranty..

rob4x4 · Oct 26, 2012

Yep I was too impatient again. The Extended Offline test completed whilst I was out and gave me the next LBA error number, so I reallocated that sector. Now Diskattr tells me there are no Offline_Uncorrectable or Current_Pending_Sector to reallocate and Reallocated_Sector_Ct is still only 2. I have manually reallocated 3 sectors and originally diskattr told me there were 5 sectors giving errors. Is that correct?
I am running the Extended Offline Test again after the last reallocation to see what it tells me.
>>> Beginning diagnostic diskattr
Running: diskattr
smartctl 5.41 2011-06-09 r3365 [7405b0-smp-linux-2.6.18-7.1] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: Seagate Pipeline HD 5900.2
Device Model: ST3500312CS
Serial Number: 6VV3BJZS
LU WWN Device Id: 5 000c50 027e2fffe
Firmware Version: SC13
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Fri Oct 26 16:17:03 2012 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 92679828
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4354
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 51230691
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1862
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2177
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 080 080 000 Old_age Always - 20
190 Airflow_Temperature_Cel 0x0022 072 045 045 Old_age Always In_the_past 28 (Min/Max 15/28)
194 Temperature_Celsius 0x0022 028 055 000 Old_age Always - 28 (0 10 0 0)
195 Hardware_ECC_Recovered 0x001a 037 031 000 Old_age Always - 92679828
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

>>> Ending diagnostic diskattr

rob4x4 · Oct 27, 2012

The Extended Offline Test completed (after 15 hours! - is that normal?) with no errors. Stats from Diskattr now say 12 reallocated sectors. Would that be a result of running the Extended Offline Test? Is everything good now? Thanks again for all the help!

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 99027595
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 096 096 020 Old_age Always - 4354
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 12
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 51542698
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1878
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 098 098 020 Old_age Always - 2177
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 099 099 000 Old_age Always - 1
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 080 080 000 Old_age Always - 20
190 Airflow_Temperature_Cel 0x0022 046 045 045 Old_age Always In_the_past 54 (Min/Max 15/55)
194 Temperature_Celsius 0x0022 054 055 000 Old_age Always - 54 (0 10 0 0)
195 Hardware_ECC_Recovered 0x001a 030 030 000 Old_age Always - 99027595
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0

>>> Ending diagnostic diskattr

af123 · Oct 29, 2012

I'd keep a close eye on the numbers - if the reallocated sector count continues to grow then, although not definitive, it is an indication that the disk may be on the way out. It's normal for a disk to have to reallocate some (there are always some manufacturing defects) and this is the first time that your disk has done an extended scan across the whole surface which explains the jump to 12.

Otherwise the box should be back to normal.

rob4x4 · Oct 29, 2012

Hmmm. Unfortunately it is just as bad as ever. Every recording over the weekend glitched. I also noticed a few stopped recording before the scheduled finished time. Eg. An hour program stopped recording after 28mins. Is there anything else to check or is it a warranty job? Would running another Extended Offline Test help, or Fix-disk? I just checke diskattr and reallocated sectors is still at 12 and no new error sectors.

af123 · Oct 29, 2012

Another fix-disk wouldn't hurt but those other symptoms sound more like a reception fault now.. have you had a look at signal strength and quality while watching live TV on those channels?

rob4x4 · Oct 29, 2012

I will check it later. I've not noticed any problems/glitches whilst watching live tv through the PVR. Just on the recordings or timeshifting. And I'm not getting messages saying recording failed due to poor reception, which I would previously get very occasionally. Also all the freeview equipment in the house runs through the same aerial, and nothing else is showing reception problems.
I've had a quick look at reception strength / quality usung the Humax/system/signal detection. Channels 54,55,58,59,61,62 are returning Strength about 75%, Quality 100%, except for channel 57 which is strength about 40%, quality 100%. I think I'm using the WinterHill transmitter at manchester. I generally only watch/record BBC's SD/HD, Channel 4 HD, all of which have been glitchy in recordings only. I'll have a good look tonight on live broadcasts, and record the same thing and compare.

As far as I remember, signal strength has always been about 75% and it never had recording glitches until a couple of weeks ago.

MartinLiddle · Oct 29, 2012

rob4x4 said:
I've had a quick look at reception strength / quality usung the Humax/system/signal detection. Channels 54,55,58,59,61,62 are returning Strength about 75%, Quality 100%, except for channel 57 which is strength about 40%, quality 100%. I think I'm using the WinterHill transmitter at manchester.

The first six multiplexes are coming from Winter Hill but channel 57 is coming from somewhere else. I would suggest a manual tune using the instructions at http://hummy.tv/forum/threads/hdr-fox-t2-tuning-advice.472/page-2#post-5824

rob4x4 · Oct 29, 2012

If I understand the Digitaluk.co.uk postcode checker correctly then I am not missing any channels from Winter Hill as I have 54,55,58,59,61 and 62.
Channel 57 doesn't feature on any of the Transmitter Charts for the local area.

MartinLiddle · Oct 29, 2012

rob4x4 said:
If I understand the Digitaluk.co.uk postcode checker correctly then I am not missing any channels from Winter Hill as I have 54,55,58,59,61 and 62.

Agreed.

Channel 57 doesn't feature on any of the Transmitter Charts for the local area.

Presumably you have channels above 800? If so are they coming from the multiplex on channel 57? If so you can just delete them. If however the channels in the 800s are coming from one of the Winter Hill multiplexes then you would be better advised to do a manual tune to eliminate channel 57.

rob4x4 · Oct 29, 2012

I don't have any channels listed above 800. I've found that Channel 57 relates to this, none of which I watch and none are repeated in other channel listings.
57 762.0 MHz

North West Local 5

view 0
ChannelProviderAuthority

51 movies4men

52 mov4men+1

53 SONY SAB

54 ARGOS TV 24/7

56 CAPITAL TV

Ezra Pound · Oct 29, 2012

That is 'Entertainment TV Ltd (Manchester TV Network) (Ch57)' See details HERE, Select Mux >> List and scroll to the botton of the page. So it looks like in your area you do have 7 different MUXs

Hard drive failure?

Member

Well-Known Member

Member

Administrator

Member

Administrator

Member

Member

Administrator

Member

Member

Administrator

Member

Administrator

Member

Super Moderator

Member

Super Moderator

Member

Well-Known Member