linux-ext4 - AW: AW: AW: ext4 filesystem bad extent error review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <B8A948099C53E0408BDBCE749AAECA9A2A80C78553@SI-MBX10.de.bosch.com>
Date:	Fri, 3 Jan 2014 19:56:45 +0100
From:	"Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@...bosch.com>
To:	Eric Sandeen <sandeen@...hat.com>, Theodore Ts'o <tytso@....edu>,
	"Huang Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
CC:	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: AW: AW: AW: ext4 filesystem bad extent error review

On Thu, Jan 03, 2014 at 19:49, Eric Sandeen wrote
> 
> On 1/3/14, 12:45 PM, Juergens Dirk (CM-AI/ECO2) wrote:
> >
> > On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote
> >>
> >> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote:
> >>> So, I think there _might_ be a kernel bug, but it could be also a
> >> problem
> >>> related to the particular type of eMMC. We did not observe the same
> >> issue
> >>> in previous tests with another type of eMMC from another supplier,
> >> but this
> >>> was with an older kernel patch level and with another HW design.
> >>>
> >>> Regarding a possible kernel bug: Is there any chance that the
> invalid
> >>> ee_len or ee_start are returned by, e.g., the block allocator ?
> >>> If so, can we try to instrument the code to get suitable traces ?
> >>> Just to see or to exclude that the corrupted inode is really
> written
> >>> to the eMMC ?
> >>
> >> From your description it does sound possible that it's a kernel bug.
> >> Adding testcases to the code to catch it before it hits the journal
> >> might be helpful - but then maybe this is something getting
> overwritten
> >> after the fact - hard to say.
> >>
> >> Can you share more details of the test you are running?  Or maybe
> even
> >> the test itself?
> >
> > Yes, for sure, we can. Weller, please provide additional details
> > or corrections.
> >
> > In short:
> > Basically we use an automated cyclic test writing many small
> > (some kBytes) files with CRC checksums for easy consistency check
> > into a separate test partition. Files also contain meta information
> > like filename,  sequence number and a random number to allow to
> identify
> > from block device image dumps, if we just see a fragment of an old
> > deleted file or a still valid one.
> >
> > Each test loop looks like this:
> 
> 0) mkfs the filesystem - with what options?  How big?

Here we do need the details from Weller, cause 
he has done all this. 

> 
> > 1) Boot the device after power on or reset
> > 2) Do fsck -n BEFORE mounting
> > 2 a) (optional) binary dump of the journal
> > 3) Mount test partition
> 
> Again with what options, if any?

Details again have to be given by Weller, sorry.

> 
> > 4) File content check for all files from prev. loop
> > 5) erase all files from previous loop
> > 6) start writing hundreds/thousands of test files
> >     in multiple directories with several threads
> 
> I guess this is where we might need more details in order,
> to try to recreate the failure, but perhaps
> this is not a case where you can simply share the IO
> generation utility...?

I think we can share the code, please let me check on Monday.

> 
> Thanks,
> -Eric
> 
> > 7) after random time cut the power or do soft reset
> >
> > If 2), 3), 4) or 5) fails, stop test.
> >
> > We are running the test usually with kind of transaction
> > safe handling, i.e. use fsync/rename, to avoid zero length files
> > or file fragments.
> >
> >>
> >> I've used a test framework in the past to simulate resets w/o
> needing
> >> to reset the box, and do many journal replays very quickly.  It'd be
> >> interesting to run it using your testcase.
> >>
> >> Thanks,
> >> -Eric
> >
> > Mit freundlichen Grüßen / Best regards
> >
> > Dirk Juergens
> >
> > Robert Bosch Car Multimedia GmbH
> >


Mit freundlichen Grüßen / Best regards

Dirk Juergens

Robert Bosch Car Multimedia GmbH
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html