[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091031091528.GO18464@mit.edu>
Date: Sat, 31 Oct 2009 05:15:28 -0400
From: Theodore Tso <tytso@....edu>
To: Andreas Dilger <adilger@....com>
Cc: Eric Sandeen <sandeen@...hat.com>,
Parag Warudkar <parag.lkml@...il.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
LKML <linux-kernel@...r.kernel.org>, linux-ext4@...r.kernel.org,
bugzilla-daemon@...zilla.kernel.org
Subject: Re: [Bug 14354] Re: ext4 increased intolerance to unclean shutdown?
On Fri, Oct 30, 2009 at 01:56:27PM -0600, Andreas Dilger wrote:
> I wonder if there are multiple problems involved here? Eric, it seems
> possible that your reproducer is exercising a similar, though unrelated
> codepath.
Note that Aneesh has pubished two patches which insert a call to
ext4_discard_preallocations(). One is a patch which inserts it into
fs/inode.c's truncate path (for direct/indirect-mapped inodes) and one
which is patch which inserts it into fs/extents.c truncate path (for
extent-mapped inodes). As near as I can tell both patches are
necessary, and it looks to me like they should be combined into a
single patch, since commit 487caeef9 affects both truncate paths.
Aneesh, do you concur?
Like Andreas, I am suspicious that there may be multiple problems
occurring here, so here is a proposed plan of attack.
Step 1) Sanity check that commit 0a80e986 shows the problem. This is
immediately after the first batch of ext4 patches which I sent to
Linus during the post-2.6.31 merge window. Given that patches in the
middle of this first patch have been reported by Avery as showing the
problem, and while we may have some "git bisect good" revisions that
were really bad, in general if a revision is reported bad, the problem
is probably there at that version and successive versions. Hence, I'm
_pretty_ sure that 0a80e986 should demonstrate the problem.
Step 2) Sanity check that commit ab86e576 does _not_ show the problem.
This commit corresponds to 2.6.31-git6, and there are no ext4 patches
that I pushed before that point. There are three commits that show up
in response to the command "git log v2.6.31..v2.6.31-git6 -- fs/ext4
fs/jbd2", but they weren't pushed by me. Although come to think of
it, Jan Kara's commit 0d34ec62, "ext4: Remove syncing logic from
ext4_file_write" is one we might want to look at very carefully if
commit ab86e576 also shows the problem....
Step 3) Assuming that Step 1 and Step 2 are as I expect, with commit
ab86e576 being "good", and commit 0a80e986 being "bad", we will have
localized the problem commit(s) to the 63 commits that were initially
pushed to Linus during the merge window. One of the commits is
487caeef9, which Aneesh has argued convincingly seems to be
problematic, and which seems to solve at least one or two reporter's
problems, but clearly isn't a complete solution. So let's try to
narrow things down further by testing this branch:
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-history
This branch corresponds to commit ab86e576 (from Step 2), but with the
problematic commit 487caeef9 removed. It was generated by applying
the following guilt patch series to v2.6.31-git6:
git://repo.or.cz/ext4-patch-queue.git test-history
The advantage of starting with the head of test-history is that if
there are multiple problematic commits, this should show the problem
(just as reverting 487caeef9 would) --- but since 487caeef9 is
actually removed, we can now do a "git bisect start test-history
v2.6.31-git6" and hopefully be able to localize whatever additional
commits might be bad.
(We could also keep applying and unapplying the patch corresponding to
the revert of 487caeef9 while doing a bisection, but that tends to be
error prone.)
Does that sounds like a plan?
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists