linux-ext4 - RE: ext4 filesystem bad extent error review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <AE39A478622CF340ABEC2418D74074F61FC59ACAC4@SGPMBX05.APAC.bosch.com>
Date:	Mon, 6 Jan 2014 10:23:17 +0800
From:	"Huang Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
To:	"Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@...bosch.com>,
	Theodore Ts'o <tytso@....edu>
CC:	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: RE: ext4 filesystem bad extent error review



>On Thu, Jan 03, 2014 at 17:30, Theodore Ts'o [mailto:tytso@....edu]
>wrote:
>> 
>> On Fri, Jan 03, 2014 at 11:16:02AM +0800, Huang Weller (CM/ESW12-CN)
>> wrote:
>> >
>> > It sounds like the barrier test. We wrote such kind test tool
>> > before, the test program used ioctl(fd, BLKFLSBUF, 0) to set a
>> > barrier before next write operation.  Do you think this ioctl is
>> > enough ? Because I saw the ext4 use it. I will do the test with that
>> > tool and then let you know the result.
>> 
>> The BLKFLSBUF ioctl does __not__ send a CACHE FLUSH command to the
>> hardware device.  It forces all of the dirty buffers in memory to the
>> storage device, and then it invalidates all the buffer cache, but it
>> does not send a CACHE FLUSH command to the hardware.  Hence, the
>> hardware is free to write it to its on-disk cache, and not necessarily
>> guarantee that the data is written to stable store.  (For an example
>> use case of BLKFLSBUF, we use it in e2fsck to drop the buffer cache
>> for benchmarking purposes.)
>> 
>> If you want to force a CACHE FLUSH (or barrier, depending on the
>> underlying transport different names may be given to this operation),
>> you need to call fsync() on the file descriptor open to the block
>> device.
>> 
>> > More information about journal block which caused the bad extents
>> > error: We enabled the mount option journal_checksum in our test.  We
>> > reproduced the same problem and the journal checksum is correct
>> > because the journal block will not be replayed if checksum is error.
>> 
>> How did you enable the journal_checksum option?  Note that this is not
>> safe in general, which is why we don't enable it or the async_commit
>> mount option by default.  The problem is that currently the journal
>> replay stops when it hits a bad checksum, and this can leave the file
>> system in a worse case than it currently is in.  There is a way we
>> could fix it, by adding per-block checksums to the journal, so we can
>> skip just the bad block, and then force an efsck afterwards, but that
>> isn't something we've implemented yet.
>> 
>> That being said, if the journal checksum was valid, and so the
>> corrupted block was replayed, it does seem to argue against
>> hardware-induced corruption.

>Yes, this was also our feeling. Please see my other mail just sent
>some minutes ago. We know about the possible problems with 
>journal_checksum, but we thought that it is a good option in our case
>to identify if this is a HW- or SW-induced issue.

>> 
>> Hmm....  I'm stumped, for the moment.  The journal layer is quite
>> stable, and we haven't had any problems like this reported in many,
>> many years.
>> 
>> Let's take this back to first principles.  How reliably can you
>> reproduce the problem?  How often does it fail?  

>With kernel 3.5.7.23 about once per overnight long term test.

>> Is it something where
>> you can characterize the workload leading to this failure?  Secondly,
>> is a power drop involved in the reproduction at all, or is this
>> something that can be reproduced by running some kind of workload, and
>> then doing a soft reset (i.e., force a kernel reboot, but _not_ do it
>> via a power drop)?

>As I stated in my other mail, it is also reproduced with soft resets.
>Weller can give more details about the test setup.
 
My test case is like this:
1. left about 700M empty space for the test
2. most of test with stress(some test without stress but we also reproduced the issue)
3. power loss and CPU WDT reset both happened during file write operations.

> 
> The other thing to ask is when did this problem first start appearing?
> With a kernel upgrade?  A compiler/toolchain upgrade?  Or has it
> always been there?
> 
> Regards,
> 
> 							- Ted


Mit freundlichen Grüßen / Best regards

Dr. rer. nat.  Dirk Juergens

Robert Bosch Car Multimedia GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html