[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B8A948099C53E0408BDBCE749AAECA9A2A80C78553@SI-MBX10.de.bosch.com>
Date: Fri, 3 Jan 2014 19:56:45 +0100
From: "Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@...bosch.com>
To: Eric Sandeen <sandeen@...hat.com>, Theodore Ts'o <tytso@....edu>,
"Huang Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: AW: AW: AW: ext4 filesystem bad extent error review
On Thu, Jan 03, 2014 at 19:49, Eric Sandeen wrote
>
> On 1/3/14, 12:45 PM, Juergens Dirk (CM-AI/ECO2) wrote:
> >
> > On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote
> >>
> >> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote:
> >>> So, I think there _might_ be a kernel bug, but it could be also a
> >> problem
> >>> related to the particular type of eMMC. We did not observe the same
> >> issue
> >>> in previous tests with another type of eMMC from another supplier,
> >> but this
> >>> was with an older kernel patch level and with another HW design.
> >>>
> >>> Regarding a possible kernel bug: Is there any chance that the
> invalid
> >>> ee_len or ee_start are returned by, e.g., the block allocator ?
> >>> If so, can we try to instrument the code to get suitable traces ?
> >>> Just to see or to exclude that the corrupted inode is really
> written
> >>> to the eMMC ?
> >>
> >> From your description it does sound possible that it's a kernel bug.
> >> Adding testcases to the code to catch it before it hits the journal
> >> might be helpful - but then maybe this is something getting
> overwritten
> >> after the fact - hard to say.
> >>
> >> Can you share more details of the test you are running? Or maybe
> even
> >> the test itself?
> >
> > Yes, for sure, we can. Weller, please provide additional details
> > or corrections.
> >
> > In short:
> > Basically we use an automated cyclic test writing many small
> > (some kBytes) files with CRC checksums for easy consistency check
> > into a separate test partition. Files also contain meta information
> > like filename, sequence number and a random number to allow to
> identify
> > from block device image dumps, if we just see a fragment of an old
> > deleted file or a still valid one.
> >
> > Each test loop looks like this:
>
> 0) mkfs the filesystem - with what options? How big?
Here we do need the details from Weller, cause
he has done all this.
>
> > 1) Boot the device after power on or reset
> > 2) Do fsck -n BEFORE mounting
> > 2 a) (optional) binary dump of the journal
> > 3) Mount test partition
>
> Again with what options, if any?
Details again have to be given by Weller, sorry.
>
> > 4) File content check for all files from prev. loop
> > 5) erase all files from previous loop
> > 6) start writing hundreds/thousands of test files
> > in multiple directories with several threads
>
> I guess this is where we might need more details in order,
> to try to recreate the failure, but perhaps
> this is not a case where you can simply share the IO
> generation utility...?
I think we can share the code, please let me check on Monday.
>
> Thanks,
> -Eric
>
> > 7) after random time cut the power or do soft reset
> >
> > If 2), 3), 4) or 5) fails, stop test.
> >
> > We are running the test usually with kind of transaction
> > safe handling, i.e. use fsync/rename, to avoid zero length files
> > or file fragments.
> >
> >>
> >> I've used a test framework in the past to simulate resets w/o
> needing
> >> to reset the box, and do many journal replays very quickly. It'd be
> >> interesting to run it using your testcase.
> >>
> >> Thanks,
> >> -Eric
> >
> > Mit freundlichen Grüßen / Best regards
> >
> > Dirk Juergens
> >
> > Robert Bosch Car Multimedia GmbH
> >
Mit freundlichen Grüßen / Best regards
Dirk Juergens
Robert Bosch Car Multimedia GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists