[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A5667440-009D-4ACB-9EA9-7462341DD704@zabbo.net>
Date: Wed, 7 Mar 2007 14:48:19 -0800
From: Zach Brown <zab@...bo.net>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Leonid Ananiev <leonid.i.ananiev@...el.com>,
Leonid Ananiev <leonid.i.ananiev@...ux.intel.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-aio <linux-aio@...ck.org>,
Chris Mason <chris.mason@...cle.com>
Subject: Re: [PATCH 1/3] aio: fix oops because of extra IO control block freeing.
On Mar 7, 2007, at 2:14 PM, Andrew Morton wrote:
> On Mon, 05 Mar 2007 17:23:33 +0300
> Leonid Ananiev <leonid.i.ananiev@...ux.intel.com> wrote:
>
>> From Leonid Ananiev
>>
>> The patch fixes oops because of extra IO control block freeing.
>> IO is retried if page could not be invalidated.
This patch is incorrect and shouldn't be merged.
>>
>> Signed-off-by: Leonid Ananiev <leonid.i.ananiev@...el.com>
>>
>> The patch fixes oops "Kernel BUG at fs/aio.c:509" archived at
>> http://lkml.org/lkml/2007/2/8/337
>> The number of IO control block (iocb)users < 0.
>> If page could not be invalidated by invalidate_inode_pages2_range()
>> than EIO is returned. It happens if journal_try_to_free_buffers()
>> fails
>> to drop_buffers().
>> This EIO is not differing from real IO competition with EIO and
>> aio_complete() is called.
(I'm going to read this as "This EIO is misinterpreted as real IO
completion with -EIO and aio_complete() is called.")
>> Later aio_complete() is called from dio_bio_end_aio() and frees iocb
>> once more.
This analysis is correct. Nothing can clobber -EIOCBQUEUED as it is
returned up from fs/direct-io.c to fs/aio.c. This -EIO from
invalidation is one of the two places that currently break this rule.
The fix I had hoped for, invalidating down in fs/direct-io.c before
returning -EIOCBQUEUED, doesn't work because it ends up getting the
ordering between the journal lock and the page lock backwards.
Sigh. (note to self: help lockdep warn us about that ordering)
>> After patch generic_file_direct_IO() sets PgBusy flag in iocb
>> if page could not be invalidated. iocb is retried after IO
>> competition.
>> The process is waked up if IO is SYNC else iocb is kicked.
>> The lines ___if (ret != -EIOCBRETRY)___ is deleted because
>> nothing set to EIOCBRETRY.
True, but this is gratuitously cruel to external users of -
EIOCBRETRY. Silently breaking them doesn't sound like a great plan.
If we really decide to remove EIOCBRETRY support we'd get rid of all
the retry infrastructure and remove the EIOCBRETRY errno so their
builds failed. That's a separate issue that shouldn't be confused
with this EIOCBQUEUED clobbering. Just removing some pieces of the
infrastructure willy-nilly isn't acceptable.
>
> Please copy linux-aio@...ck.org on AIO patches.
>
>> Next patches 2/3 and 3/3 do cleanup only.
>
> I cannot find those patches.
Me either. I was waiting for them to arrive before responding.
>> diff -upr linux-2.6.20/mm/filemap.c linux-2.6.20-aio2/mm/filemap.c
>> --- linux-2.6.20/mm/filemap.c 2007-02-04 21:44:54.000000000 +0300
>> +++ linux-2.6.20-aio2/mm/filemap.c 2007-03-04 21:46:10.000000000
>> +0300
>> @@ -2413,10 +2413,9 @@ generic_file_direct_IO(int rw, struct ki
>> if (rw == WRITE && mapping->nrpages) {
>> pgoff_t end = (offset + write_len - 1)
>> >> PAGE_CACHE_SHIFT;
>> - int err = invalidate_inode_pages2_range(mapping,
>> - offset >> PAGE_CACHE_SHIFT, end);
>> - if (err)
>> - retval = err;
>> + if (invalidate_inode_pages2_range(mapping,
>> + offset >> PAGE_CACHE_SHIFT, end))
>> + kiocbSetPgBusy(iocb);
There are two problems behind this bug:
- ext3_releasepage() returns failure if it races with kjournald
holding a reference while it waits for a transaction to commit.
- generic_file_direct_IO() causes an oops if it clobbers -EIOCBQUEUED
with the return code from invalidate_inode_pages2_range()..-
>releasepage().
This fix makes the incorrect assertion that *any* failure from
invalidate_inode_pages2_range(), which might not have anything to do
with this race you're currently seeing, is transitory and should
trigger a retry. That's wrong, it should be returning the error.
Now, getting ext3_releasepage() to not fail if this race hits to
begin with is another story. Chris has some ideas about reworking
the page laundering helper to make that more reliable.
- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists