[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B8A948099C53E0408BDBCE749AAECA9A2A80C78551@SI-MBX10.de.bosch.com>
Date: Fri, 3 Jan 2014 19:45:40 +0100
From: "Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@...bosch.com>
To: Eric Sandeen <sandeen@...hat.com>, Theodore Ts'o <tytso@....edu>,
"Huang Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: AW: AW: ext4 filesystem bad extent error review
On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote
>
> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote:
> > So, I think there _might_ be a kernel bug, but it could be also a
> problem
> > related to the particular type of eMMC. We did not observe the same
> issue
> > in previous tests with another type of eMMC from another supplier,
> but this
> > was with an older kernel patch level and with another HW design.
> >
> > Regarding a possible kernel bug: Is there any chance that the invalid
> > ee_len or ee_start are returned by, e.g., the block allocator ?
> > If so, can we try to instrument the code to get suitable traces ?
> > Just to see or to exclude that the corrupted inode is really written
> > to the eMMC ?
>
> From your description it does sound possible that it's a kernel bug.
> Adding testcases to the code to catch it before it hits the journal
> might be helpful - but then maybe this is something getting overwritten
> after the fact - hard to say.
>
> Can you share more details of the test you are running? Or maybe even
> the test itself?
Yes, for sure, we can. Weller, please provide additional details
or corrections.
In short:
Basically we use an automated cyclic test writing many small
(some kBytes) files with CRC checksums for easy consistency check
into a separate test partition. Files also contain meta information
like filename, sequence number and a random number to allow to identify
from block device image dumps, if we just see a fragment of an old
deleted file or a still valid one.
Each test loop looks like this:
1) Boot the device after power on or reset
2) Do fsck -n BEFORE mounting
2 a) (optional) binary dump of the journal
3) Mount test partition
4) File content check for all files from prev. loop
5) erase all files from previous loop
6) start writing hundreds/thousands of test files
in multiple directories with several threads
7) after random time cut the power or do soft reset
If 2), 3), 4) or 5) fails, stop test.
We are running the test usually with kind of transaction
safe handling, i.e. use fsync/rename, to avoid zero length files
or file fragments.
>
> I've used a test framework in the past to simulate resets w/o needing
> to reset the box, and do many journal replays very quickly. It'd be
> interesting to run it using your testcase.
>
> Thanks,
> -Eric
Mit freundlichen Grüßen / Best regards
Dirk Juergens
Robert Bosch Car Multimedia GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists