[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090515101146.GA6816@mit.edu>
Date: Fri, 15 May 2009 06:11:46 -0400
From: Theodore Tso <tytso@....edu>
To: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc: Kevin Shanahan <kmshanah@...b.org.au>,
Andreas Dilger <adilger@....com>, Alex Tomas <bzzz@....com>,
linux-ext4@...r.kernel.org
Subject: Re: More ext4 acl/xattr corruption - 4th occurence now
On Fri, May 15, 2009 at 10:25:08AM +0530, Aneesh Kumar K.V wrote:
> > commit 039ed7a483fdcb2dbbc29f00cd0d74c101ab14c5
> > Author: Theodore Ts'o <tytso@....edu>
> > Date: Thu May 14 17:09:37 2009 -0400
> >
> > ext4: Fix race in ext4_inode_info.i_cached_extent
> >
> > If one CPU is reading from a file while another CPU is writing to the
> > same file different locations, there is nothing protecting the
> > i_cached_extent structure from being used and updated at the same
> > time. This could potentially cause the wrong location on disk to be
> > read or written to, including potentially causing the corruption of
> > the block group descriptors and/or inode table.
>
>
> It should be multiple readers. We don't allow read/write or multiple
> writers via ext4_ext_get_blocks. &EXT4_I(inode)->i_data_sem is supposed
> to protect read/write and multiple writers. What it allowed was
> multiple readers(get_block call with create = 0). And readers did cache
> the extent information which it read from the disk. So the fix is
> correct, but we need to update the commit message.
Yes, you're right. The i_data_sem would have protected us from the
worst of these problems. The commit message isn't totally wrong,
though, since even a writer will call ext4_ext_get_blocks() with
create == 0. But we should describe the failure mode much more
accurately, and say that it is two callers of ext4_ext_get_blocks()
with create=0 racing against one another. It's possible, perhaps even
likely, that one of those callers came from a call of
ext4_get_blocks() with create=1 --- given that the people who were
reporting this were doing so when doing backup jobs.
However, one could imagine multiple mysql (or DB2) threads writing
into random parts of a database, and if two CPU's try to bmap already
allocated (and initialized) blocks out of the database file, so that
ext4_ext_get_blocks() is always being called with create=0, you could
very easily end up with a situation where blocks would be written to
the wrong location on disk. <<shiver>>
We should make sure that all of the enterprise distributions that are
shipping ext4 as a "technology preview" get a copy of this patch as a
fixing a high-priority bug, since a database workload should trip this
pretty easily.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists