linux-ext4 - Re: e2fsck not fixing deleted inode referenced errors?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <542B1220.8020208@bitsync.net>
Date:	Tue, 30 Sep 2014 22:27:12 +0200
From:	Zlatko Calusic <zcalusic@...sync.net>
To:	Theodore Ts'o <tytso@....edu>
CC:	"Darrick J. Wong" <darrick.wong@...cle.com>,
	linux-ext4@...r.kernel.org
Subject: Re: e2fsck not fixing deleted inode referenced errors?

On 30.09.2014 21:54, Theodore Ts'o wrote:
> On Tue, Sep 30, 2014 at 08:43:04PM +0200, Zlatko Calusic wrote:
>> Full error message from the kernel log, together with data check I did in
>> the evening:
>>
>> Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr
>> 0x4010000 action 0xe frozen
>> Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, connection
>> status changed
>> Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch }
>> Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT
>> Sep 29 05:07:51 atlas kernel: ata2.00: cmd
>> ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a         res
>> 40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error)
>> Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY }
>> Sep 29 05:07:51 atlas kernel: ata2: hard resetting link
>> Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please be
>> patient (ready=0)
>> Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
>> SControl 300)
>> Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133
>> Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10
>> Sep 29 05:08:00 atlas kernel: ata2: EH complete
>
> That looks really bad; it sounds like you have a hardware error on at
> least one of your disks.  Have you tried running running badblocks on
> both disks to make sure the disk isn't flagging more bad blocks, and
> then resynchronizing the RAID 1 array?   Then try running e2fsck again.
>

Yep, both disks are pretty old, somewhere at the end of warranty. Yet 
the interesting thing is that exactly that error (FLUSH CACHE EXT) 
happened from time to time, say once a year, but never before I got in 
such trouble that e2fsck wouldn't save the day after one quick run.

I now remember Darrick also asked for smartctl data. Here it is:

/dev/sda
========
Power_On_Hours 40984

and only 2 SMART READ/WRITE LOG errors in the log from long time ago...

ATA Error Count: 2
Error 1 occurred at disk power-on lifetime: 14493 hours (603 days + 21 
hours)
Error 2 occurred at disk power-on lifetime: 14493 hours (603 days + 21 
hours)

Full: http://pastebin.com/GnQhACXf

/dev/sdb (I believe the disk responsible for the problem)
========
Power_On_Hours 40978

No Errors Logged

Full: http://pastebin.com/nUB2q0Tk

Unless you have other ideas, I will run badblocks. Although, as ext4 fs 
is on /dev/md2, I think I should run it on /dev/md2 only? Do you really 
mean to run it on /dev/sda2, /dev/sdb2 - underlying devices? I'm not 
sure how MD would cope with it.

But, I'm pretty sure that it will come out clean. The md check I did 
last night would surely detected bad blocks if there were any. Or not?

Thanks for your help!
-- 
Zlatko

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html