[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1306171435301.3270@localhost.localdomain>
Date: Mon, 17 Jun 2013 14:46:29 +0200 (CEST)
From: Lukáš Czerner <lczerner@...hat.com>
To: "Theodore Ts'o" <tytso@....edu>
cc: linux-ext4@...r.kernel.org
Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in
punch_hole
On Mon, 17 Jun 2013, Theodore Ts'o wrote:
> Date: Mon, 17 Jun 2013 08:25:18 -0400
> From: Theodore Ts'o <tytso@....edu>
> To: Lukáš Czerner <lczerner@...hat.com>
> Cc: linux-ext4@...r.kernel.org
> Subject: Re: [PATCH v4 15/20] ext4: use ext4_zero_partial_blocks in punch_hole
>
> On Mon, Jun 17, 2013 at 11:08:32AM +0200, Lukáš Czerner wrote:
> > > Correction... reverting patches #15 through #19 (which is what I did in
> > > the dev-with-revert branch found on ext4.git) causes the problem to go
> > > away in the nojournal case, but it causes a huge number of other
> > > problems. Some of the reverts weren't clean, so it's possible I
> > > screwed up one of the reverts. It's also possible that only applying
> > > part of this series leaves the tree in an unstable state.
> > >
> > > I'd much rather figure out how to fix the problem on the dev branch,
> > > so thank you for looking into this!
> >
> > Wow, this looks bad. Theoretically reverting patches %15 through
> > #19 should not have any real impact. So far I do not see what is
> > causing that, but I am looking into this.
>
> I've been looking into this more intensively over the weekend. I'm
> now beginning to think we have had a pre-existing race, and the
> changes in question has simply changed the timing. I tried a version
> of the dev branch (you can find it as the branch dev2 in my
> kernel.org's ext4.git tree) which only had patches 1 through 10 of the
> invalidate page range patches (dropping patches 11 through 20), and I
> found that generic/300 was failing in the configuration ext3 (a file
> system with nodelalloc, no flex_bg, and no extents). I also found
> the same failure with a 3.10-rc2 configuration.
>
> The your changes seem to make generic/300 failure consistently for me
> using the nojournal configuration, but looking at patches in question,
> I don't think they could have directly caused the problem. Instead, I
> think they just changed the timing to unmask the problem.
Ok, I though that there is something weird because patches #1-#14
should not cause anything like that and from my testing (see my
previous mail) it really seems it does not cause it, at least not
directly.
>
> Given that I've seen generic/300 test failures in various different
> baselines going all the way back to 3.9-rc4, this isn't a recent
> regression. And given that it does seem to be timing sensitive,
> bisecting it is going to be difficult. On the other hand, given that
> using the dev (or master) branch, generic/300 is failing with a
> greater than 70% probability using kvm with 2 cpu's, 2 megs of RAM and
> 5400 rpm laptop drives in nojournal mode, the fact that it's
> reproducing relatively reliably hopefully will make it easier to find
> the problem.
As mentioned in previous email test generic/300 runs without any
problems (even in the loop) without journal with patches #1 through
#14 applied on 3.10-rc2 (c7788792a5e7b0d5d7f96d0766b4cb6112d47d75).
This is on kvm with 24 cpu's, 8GB of RAM (I suppose you're not using
2MB of ram in your setup, but rather 2GB :) and server drives with
linear lvm on top of it.
-Lukas
>
> > I see that there are problems in other mode, not just nojournal. Are
> > those caused by this as well, or are you seeing those even without
> > the patchset ?
>
> I think the other problems in my dev-with-revert branch was caused by
> some screw up on my part when did the revert using git. I found that
> dropping the patches from a copy of the guilt patch stack, and then
> applying all of the patches except for the last half of the invalidate
> page range patch series, resulted in a clean branch that didn't have
> any of these failures. It's what I should have done late last week,
> instead of trying to use "git revert".
>
> Cheers,
>
> - Ted
>
Powered by blists - more mailing lists