lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 4 Oct 2006 10:25:22 -0700
From:	Andrew Morton <akpm@...l.org>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	linux-kernel@...r.kernel.org, zach.brown@...cle.com
Subject: Re: [patch] call truncate_inode_pages in the DIO fallback to
 buffered I/O path

On Wed, 04 Oct 2006 13:04:42 -0400
Jeff Moyer <jmoyer@...hat.com> wrote:

> Hi,
> 
> It seems that Oracle creates sparse files when doing table creates, and
> then populates those files using O_DIRECT I/O.  That means that every I/O to
> the sparse file falls back to buffered I/O.  Currently, such a sequential
> O_DIRECT write to a sparse file will end up populating the page cache.  The
> problem is that we don't invalidate the page cache pages used to perform
> the buffered fallback.

Why is this a problem?  It's just like someone did a write(), and we'll
invalidate the pagecache on the next direct-io operation.

>  After talking this over with Zach, we agreed that
> there should be a call to truncate_inode_pages_range after the buffered I/O
> fallback.
> 
> Attached is a patch which addresses the problem in my testing.  I wrote a
> simple test program that creates a sparse file and issues sequential DIO
> writes to it.  Before the patch, the page cache would grow as the file was
> written.  With the patch, the page cache does not grow.
> 
> Comments welcome.
> 
> Cheers,
> 
> Jeff
> 
> Signed-off-by: Jeff Moyer <jmoyer@...hat.com>
> 
> --- linux-2.6.18.i686/mm/filemap.c.orig	2006-10-02 12:59:25.000000000 -0400
> +++ linux-2.6.18.i686/mm/filemap.c	2006-10-04 12:54:51.000000000 -0400
> @@ -2350,7 +2350,7 @@ __generic_file_aio_write_nolock(struct k
>  				unsigned long nr_segs, loff_t *ppos)
>  {
>  	struct file *file = iocb->ki_filp;
> -	const struct address_space * mapping = file->f_mapping;
> +	struct address_space * mapping = file->f_mapping;
>  	size_t ocount;		/* original count */
>  	size_t count;		/* after file limit checks */
>  	struct inode 	*inode = mapping->host;
> @@ -2417,6 +2417,15 @@ __generic_file_aio_write_nolock(struct k
>  
>  	written = generic_file_buffered_write(iocb, iov, nr_segs,
>  			pos, ppos, count, written);
> +
> +	/*
> +	 *  When falling through to buffered I/O, we need to ensure that the
> +	 *  page cache pages are written to disk and invalidated to preserve
> +	 *  the expected O_DIRECT semantics.
> +	 */
> +	if (unlikely(file->f_flags & O_DIRECT))
> +		truncate_inode_pages_range(mapping, pos, pos + count - 1);
> +
>  out:
>  	current->backing_dev_info = NULL;
>  	return written ? written : err;

eek.  truncate_inode_pages() will throw away dirty data.  Very dangerous,
much chin-scratching needed.

It also has security implications: if a user can force this shootdown to
happen against dirty data at the right time (perhaps with multiple
processes) then a subsequent read of that part of the file will yield
uninitialised disk data.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ