linux-ext4 - Re: ext4 metadata corruption bug?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140410221702.GD31614@thunk.org>
Date:	Thu, 10 Apr 2014 18:17:02 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Nathaniel W Filardo <nwf@...jhu.edu>
Cc:	Mike Rubin <mrubin@...gle.com>, Frank Mayhar <fmayhar@...gle.com>,
	admins@....jhu.edu, linux-ext4@...r.kernel.org
Subject: Re: ext4 metadata corruption bug?

On Thu, Apr 10, 2014 at 12:33:51PM -0400, Nathaniel W Filardo wrote:
> 
> Shouldn't cache reordering or fail to flush correctly only matter if the
> machine is crashing or otherwise losing power?  I suppose it's possible
> there's a bug that would cause the cache to fail to write a block at all,
> rather than simply "too late".  But as I said before, we've not had any
> crashes or otherwise lost uptime anywhere: host, guest, storage providers,
> etc.

If it's a cache flush problem, yes, it would only matter if there had
been a crash.  Knowing that what you are doing is a AFS mirror, this
seems even stranger, since writes would be very rare, and it's not
like you there would be a whole lot of opportunities for races ---
when you mirror an FTP site, you write a new file sequentially, and
it's not like there multiple CPU's trying to modify the file at the
same time, etc.

And if you are just seeing the results of random bit flips, one would
expect other types of corruption getting reported.  So I don't know.
This is a mystery so far...

> That said, we do occasionally, though much less often than we get reports of
> corrupted metadata, get messages that I don't know how to decode from the
> ATA stack (though naively they all seemed to be successfully resolved
> transients)?  One of our VMs, nearly identically configured, though not the
> one that's been reporting corruption on its filesystem, spat this out the
> other day, for example:
> 
> [532625.888251] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [532625.888762] ata1.00: failed command: FLUSH CACHE
> [532625.889128] ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> [532625.889128]          res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (time out)
> [532625.889945] ata1.00: status: { DRDY }
> [532630.928064] ata1: link is slow to respond, please be patient (ready=0)
> [532635.912178] ata1: device not ready (errno=-16), forcing hardreset
> [532635.912220] ata1: soft resetting link
> [532636.070087] ata1.00: configured for MWDMA2
> [532636.070701] ata1.01: configured for MWDMA2
> [532636.070705] ata1.00: retrying FLUSH 0xe7 Emask 0x4
> [532651.068208] ata1.00: qc timeout (cmd 0xe7)
> [532651.068216] ata1.00: FLUSH failed Emask 0x4
> [532651.236146] ata1: soft resetting link
> [532651.393918] ata1.00: configured for MWDMA2
> [532651.394533] ata1.01: configured for MWDMA2
> [532651.394537] ata1.00: retrying FLUSH 0xe7 Emask 0x4
> [532651.395550] ata1.00: device reported invalid CHS sector 0
> [532651.395564] ata1: EH complete

Yeah, that doesn't look good, but you're using some kind of remote
block device here, right?  I'm not sure how qemu is translating that
into pseudo ATA commands.  Maybe that corresponds with a broken
network connection which required creating a new TCP connection or
some such?  I'm not really that familiar with the remote block device
code.  So I also can't really give you any advice about whhether it
would be better to use virtio versus achi.  I would expect that virtio
will probably be faster, but it might not matter for your application.

Cheers,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html