[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B8A948099C53E0408BDBCE749AAECA9A2A80C78545@SI-MBX10.de.bosch.com>
Date: Fri, 3 Jan 2014 17:40:11 +0100
From: "Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@...bosch.com>
To: Theodore Ts'o <tytso@....edu>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
"Huang Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
Subject: AW: ext4 filesystem bad extent error review
On Thu, Jan 03, 2014 at 17:30, Theodore Ts'o [mailto:tytso@....edu]
wrote:
>
> On Fri, Jan 03, 2014 at 11:16:02AM +0800, Huang Weller (CM/ESW12-CN)
> wrote:
> >
> > It sounds like the barrier test. We wrote such kind test tool
> > before, the test program used ioctl(fd, BLKFLSBUF, 0) to set a
> > barrier before next write operation. Do you think this ioctl is
> > enough ? Because I saw the ext4 use it. I will do the test with that
> > tool and then let you know the result.
>
> The BLKFLSBUF ioctl does __not__ send a CACHE FLUSH command to the
> hardware device. It forces all of the dirty buffers in memory to the
> storage device, and then it invalidates all the buffer cache, but it
> does not send a CACHE FLUSH command to the hardware. Hence, the
> hardware is free to write it to its on-disk cache, and not necessarily
> guarantee that the data is written to stable store. (For an example
> use case of BLKFLSBUF, we use it in e2fsck to drop the buffer cache
> for benchmarking purposes.)
>
> If you want to force a CACHE FLUSH (or barrier, depending on the
> underlying transport different names may be given to this operation),
> you need to call fsync() on the file descriptor open to the block
> device.
>
> > More information about journal block which caused the bad extents
> > error: We enabled the mount option journal_checksum in our test. We
> > reproduced the same problem and the journal checksum is correct
> > because the journal block will not be replayed if checksum is error.
>
> How did you enable the journal_checksum option? Note that this is not
> safe in general, which is why we don't enable it or the async_commit
> mount option by default. The problem is that currently the journal
> replay stops when it hits a bad checksum, and this can leave the file
> system in a worse case than it currently is in. There is a way we
> could fix it, by adding per-block checksums to the journal, so we can
> skip just the bad block, and then force an efsck afterwards, but that
> isn't something we've implemented yet.
>
> That being said, if the journal checksum was valid, and so the
> corrupted block was replayed, it does seem to argue against
> hardware-induced corruption.
Yes, this was also our feeling. Please see my other mail just sent
some minutes ago. We know about the possible problems with
journal_checksum, but we thought that it is a good option in our case
to identify if this is a HW- or SW-induced issue.
>
> Hmm.... I'm stumped, for the moment. The journal layer is quite
> stable, and we haven't had any problems like this reported in many,
> many years.
>
> Let's take this back to first principles. How reliably can you
> reproduce the problem? How often does it fail?
With kernel 3.5.7.23 about once per overnight long term test.
> Is it something where
> you can characterize the workload leading to this failure? Secondly,
> is a power drop involved in the reproduction at all, or is this
> something that can be reproduced by running some kind of workload, and
> then doing a soft reset (i.e., force a kernel reboot, but _not_ do it
> via a power drop)?
As I stated in my other mail, it is also reproduced with soft resets.
Weller can give more details about the test setup.
>
> The other thing to ask is when did this problem first start appearing?
> With a kernel upgrade? A compiler/toolchain upgrade? Or has it
> always been there?
>
> Regards,
>
> - Ted
Mit freundlichen Grüßen / Best regards
Dr. rer. nat. Dirk Juergens
Robert Bosch Car Multimedia GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists