[<prev] [next>] [day] [month] [year] [list]
Message-ID: <46437857.9070403@cosmosbay.com>
Date: Thu, 10 May 2007 21:53:59 +0200
From: Eric Dumazet <dada1@...mosbay.com>
To: Fengguang Wu <fengguang.wu@...il.com>
CC: Andi Kleen <andi@...stfloor.org>, Andrew Morton <akpm@...l.org>,
Oleg Nesterov <oleg@...sign.ru>,
Steven Pratt <slpratt@...tin.ibm.com>,
Ram Pai <linuxram@...ibm.com>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC] splice() and readahead interaction
Fengguang Wu a écrit :
> 2007/5/2, Eric Dumazet <dada1@...mosbay.com <mailto:dada1@...mosbay.com>>:
>
> Since you work on readahead, could you please find the reason
> following program triggers a problem in splice() syscall ?
>
> Description :
>
> I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
> environnement, in an attempt to implement cheap AIO, and zero-copy
> splice() feature.
>
> I quicky found that readahead in splice() is not really working.
>
> To demonstrate the problem, just compile the attached program, and
> use it to pipe a big file (not yet in cache) to /dev/null :
>
> $ gcc -o spliceout spliceout.c
> $ spliceout -d BIGFILE | cat >/dev/null
> offset=49152 ret=49152
> offset=65536 ret=16384
> offset=131072 ret=65536
> ...no more progress... (splice() returns -1 and EAGAIN)
>
> reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
> to exploit its ability to call readahead(), and do some progress if
> pages are ready in cache.
>
> But apparently, even on an idle machine, it is not working as expected.
>
>
>
> Eric Dumazet, thank you for disclosing this bug.
>
> Readahead logic somehow fails to populate the page range with data.
> It can be because
> 1) the readahead routine is not always called in the following lines of
> fs/splice.c:
> if (!loff || nr_pages > 1)
> page_cache_readahead(mapping, &in->f_ra, in, index,
> nr_pages);
> 2) even called, page_cache_readahead() wont guarantee the pages are there.
> It wont submit readahead I/O for pages already in the radix tree, or
> when (ra_pages == 0), or after 256 cache hits.
>
> In your case, it should be because of the retried reads, which lead to
> excessive cache hits, and disables readahead at some time.
>
> And that _one_ failure of readahead blocks the whole read process.
> The application receives EAGAIN and retries the read, but
> __generic_file_splice_read() refuse to make progress:
> - in the previous invocation, it has allocated a blank page and inserted
> it into the radix tree, but never has the chance to start I/O for it:
> the test of SPLICE_F_NONBLOCK goes before that.
> - in the retried invocation, the readahead code will neither get out of
> the cache hit mode, nor will it submit I/O for an already existing page.
>
> The attached patch should fix the critical splice bug. Sorry for not
> being able to test it locally for now - I'm at home and running knoppix.
> And the readahead bug will be fixed by the upcoming on-demand readahead
> patch. I should be back and submit it after a week.
>
> Thank you,
> Fengguang Wu
>
>
> ------------------------------------------------------------------------
>
> --- linux-2.6.21.1/fs/splice.c.old 2007-05-05 04:40:38.000000000 -0400
> +++ linux-2.6.21.1/fs/splice.c 2007-05-05 04:41:59.000000000 -0400
> @@ -378,10 +378,11 @@
> * If in nonblock mode then dont block on waiting
> * for an in-flight io page
> */
> - if (flags & SPLICE_F_NONBLOCK)
> - break;
> -
> - lock_page(page);
> + if (flags & SPLICE_F_NONBLOCK) {
> + if (TestSetPageLocked(page))
> + break;
> + } else
> + lock_page(page);
>
> /*
> * page was truncated, stop here. if this isn't the
Sorry for the delay.
This patches solves the problem, thank you !
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists