lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Thu, 10 May 2007 21:53:59 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Fengguang Wu <fengguang.wu@...il.com>
CC:	Andi Kleen <andi@...stfloor.org>, Andrew Morton <akpm@...l.org>,
	Oleg Nesterov <oleg@...sign.ru>,
	Steven Pratt <slpratt@...tin.ibm.com>,
	Ram Pai <linuxram@...ibm.com>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC] splice() and readahead interaction

Fengguang Wu a écrit :
> 2007/5/2, Eric Dumazet <dada1@...mosbay.com <mailto:dada1@...mosbay.com>>:
> 
>     Since you work on readahead, could you please find the reason
>     following program triggers a problem in splice() syscall ?
> 
>     Description :
> 
>     I tried to use splice(SPLICE_F_NONBLOCK) in a non blocking
>     environnement, in an attempt to implement cheap AIO, and zero-copy
>     splice() feature.
> 
>     I quicky found that readahead in splice() is not really working.
> 
>     To demonstrate the problem, just compile the attached program, and
>     use it to pipe a big file (not yet in cache) to /dev/null :
> 
>     $ gcc -o spliceout spliceout.c
>     $ spliceout -d BIGFILE | cat >/dev/null
>     offset=49152 ret=49152
>     offset=65536 ret=16384
>     offset=131072 ret=65536
>     ...no more progress...   (splice() returns -1 and EAGAIN)
> 
>     reading splice(SPLICE_F_NONBLOCK) syscall implementation, I expected
>     to exploit its ability to call readahead(), and do some progress if
>     pages are ready in cache.
> 
>     But apparently, even on an idle machine, it is not working as expected.
> 
> 
> 
> Eric Dumazet, thank you for disclosing this bug.
> 
> Readahead logic somehow fails to populate the page range with data.
> It can be because
> 1) the readahead routine is not always called in the following lines of 
> fs/splice.c:
>         if (!loff || nr_pages > 1)
>                 page_cache_readahead(mapping, &in->f_ra, in, index, 
> nr_pages);
> 2) even called, page_cache_readahead() wont guarantee the pages are there.
> It wont submit readahead I/O for pages already in the radix tree, or 
> when (ra_pages == 0), or after 256 cache hits.
> 
> In your case, it should be because of the retried reads, which lead to 
> excessive cache hits, and disables readahead at some time.
> 
> And that _one_ failure of readahead blocks the whole read process.
> The application receives EAGAIN and retries the read, but 
> __generic_file_splice_read() refuse to make progress:
> - in the previous invocation, it has allocated a blank page and inserted 
> it into the radix tree, but never has the chance to start I/O for it: 
> the test of SPLICE_F_NONBLOCK goes before that.
> - in the retried invocation, the readahead code will neither get out of 
> the cache hit mode, nor will it submit I/O for an already existing page.
> 
> The attached patch should fix the critical splice bug. Sorry for not 
> being able to test it locally for now - I'm at home and running knoppix. 
> And the readahead bug will be fixed by the upcoming on-demand readahead 
> patch. I should be back and submit it after a week.
> 
> Thank you,
> Fengguang Wu
> 
> 
> ------------------------------------------------------------------------
> 
> --- linux-2.6.21.1/fs/splice.c.old	2007-05-05 04:40:38.000000000 -0400
> +++ linux-2.6.21.1/fs/splice.c	2007-05-05 04:41:59.000000000 -0400
> @@ -378,10 +378,11 @@
>  			 * If in nonblock mode then dont block on waiting
>  			 * for an in-flight io page
>  			 */
> -			if (flags & SPLICE_F_NONBLOCK)
> -				break;
> -
> -			lock_page(page);
> +			if (flags & SPLICE_F_NONBLOCK) {
> +				if (TestSetPageLocked(page))
> +					break;
> +			} else
> +				lock_page(page);
>  
>  			/*
>  			 * page was truncated, stop here. if this isn't the

Sorry for the delay.

This patches solves the problem, thank you !


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ