[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AE39A478622CF340ABEC2418D74074F61FC59ACA67@SGPMBX05.APAC.bosch.com>
Date: Mon, 6 Jan 2014 09:44:41 +0800
From: "Huang Weller (CM/ESW12-CN)" <Weller.Huang@...bosch.com>
To: Eric Sandeen <sandeen@...hat.com>,
"Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@...bosch.com>,
Theodore Ts'o <tytso@....edu>
CC: "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: RE: AW: AW: ext4 filesystem bad extent error review
> On Thu, Jan 03, 2014 at 19:24, Eric Sandeen wrote
>>
>> On 1/3/14, 10:29 AM, Juergens Dirk (CM-AI/ECO2) wrote:
>>> So, I think there _might_ be a kernel bug, but it could be also a
>> problem
>>> related to the particular type of eMMC. We did not observe the same
>> issue
>>> in previous tests with another type of eMMC from another supplier,
>> but this
>>> was with an older kernel patch level and with another HW design.
>>>
>>> Regarding a possible kernel bug: Is there any chance that the invalid
>>> ee_len or ee_start are returned by, e.g., the block allocator ?
>>> If so, can we try to instrument the code to get suitable traces ?
>>> Just to see or to exclude that the corrupted inode is really written
>>> to the eMMC ?
>>
>> From your description it does sound possible that it's a kernel bug.
>> Adding testcases to the code to catch it before it hits the journal
>> might be helpful - but then maybe this is something getting overwritten
>> after the fact - hard to say.
>>
>> Can you share more details of the test you are running? Or maybe even
>> the test itself?
>
> Yes, for sure, we can. Weller, please provide additional details
> or corrections.
>
> In short:
> Basically we use an automated cyclic test writing many small
> (some kBytes) files with CRC checksums for easy consistency check
> into a separate test partition. Files also contain meta information
> like filename, sequence number and a random number to allow to identify
> from block device image dumps, if we just see a fragment of an old
> deleted file or a still valid one.
>
> Each test loop looks like this:
>0) mkfs the filesystem - with what options? How big?
I used default options like this: mkfs.ext4 -E nodiscard /dev/$PAR
Because we found it will take long time to so disk formatting if there is no option "-E nodiscard ".
> 1) Boot the device after power on or reset
> 2) Do fsck -n BEFORE mounting
> 2 a) (optional) binary dump of the journal
> 3) Mount test partition
>Again with what options, if any?
Normally, I used below options:
-ext4 default options: rw,relatime,data=ordered,barrier=1
-rw,relatime,data=ordered,barrier=1,journal_checksum
And the test partition size is about 6G. But I filled the test partition and make there is only 700M empty space left.
And during the test, I use the tool stress to generate CPU loading. As I remember, the CPU loading is around %70. Not all the test with stress. We did the test without stress and reproduced the issue.
> 4) File content check for all files from prev. loop
> 5) erase all files from previous loop
> 6) start writing hundreds/thousands of test files
> in multiple directories with several threads
>I guess this is where we might need more details in order,
>to try to recreate the failure, but perhaps
>this is not a case where you can simply share the IO
>generation utility...?
I attached my test code, scripts and some introduction document in this mail. Please don't laugh me if there is some ugly code :-)
Thanks,
-Eric
> 7) after random time cut the power or do soft reset
>
> If 2), 3), 4) or 5) fails, stop test.
>
> We are running the test usually with kind of transaction
> safe handling, i.e. use fsync/rename, to avoid zero length files
> or file fragments.
>
>>
>> I've used a test framework in the past to simulate resets w/o needing
>> to reset the box, and do many journal replays very quickly. It'd be
>> interesting to run it using your testcase.
>>
>> Thanks,
>> -Eric
>
> Mit freundlichen Grüßen / Best regards
>
> Dirk Juergens
>
> Robert Bosch Car Multimedia GmbH
>
Download attachment "code_out.tar.gz" of type "application/x-gzip" (48715 bytes)
Powered by blists - more mailing lists