linux-ext4 - Re: [PATCH] direct-io: fix stale data exposure from concurrent buffered read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160506104622.GB10350@eguan.usersys.redhat.com>
Date:	Fri, 6 May 2016 18:46:22 +0800
From:	Eryu Guan <guaneryu@...il.com>
To:	Jeff Moyer <jmoyer@...hat.com>
Cc:	Jan Kara <jack@...e.cz>, Theodore T'so <tytso@...gle.com>,
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] direct-io: fix stale data exposure from concurrent
 buffered read

On Thu, May 05, 2016 at 03:39:29PM -0400, Jeff Moyer wrote:
> Hi, Eryu,
> 
> Thanks for the great description of the problem!  I have some comments
> below.
> 
> Eryu Guan <guaneryu@...il.com> writes:
> 
> > Direct writes inside i_size on a DIO_SKIP_HOLES filesystem are not
> > allowed to allocate blocks(get_more_blocks() sets 'create' to 0 before
> > calling get_bkicl() callback), if it's a sparse file, direct writes fall
> > back to buffered writes to avoid stale data exposure from concurrent
> > buffered read.
> >
> > But the detection for "writing inside i_size" is not correct, writes can
> > be treated as "extending writes" wrongly, which results in block
> > allocation for holes and could expose stale data. This is because we're
> > checking on the fs blocks not the actual offset and i_size in bytes.
> >
> > For example, direct write 1FSB to a 1FSB(or smaller) sparse file on
> > ext2/3/4, starting from offset 0, in this case it's writing inside
> > i_size, but 'create' is non-zero, because 'sdio->block_in_file' and
> > '(i_size_read(dio->inode) >> sdio->blkbits' are both zero.
> >
> > This can be demonstrated by running ltp-aiodio test ADSP045 many times.
> > When testing on extN filesystems, I see test failures occasionally,
> > buffered read could read non-zero (stale) data.
> >
> > ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 2
> >
> > dio_sparse    0  TINFO  :  Dirtying free blocks
> > dio_sparse    0  TINFO  :  Starting I/O tests
> > non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
> > non-zero read at offset 0
> > dio_sparse    0  TINFO  :  Killing childrens(s)
> > dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally
> 
> OK, so in this case, block_in_file is 0, i_size_read(inode) is 2048, and
> i_blkbits is 12.
> 
> > Fix it by checking on the actual offset and i_size in bytes, not the fs
> > blocks where offset and i_size are in, to make sure it's really writing
> > into the file.
> 
> I think this code operates on blocks for a reason: we're trying to
> determine if we'll trigger block allocation, right?  For example,
> consider a sparse file with i_size of 2k, and a write to offset 2k into
> the file, with a file system block size of 4k.  Should that have create
> set or not?

Thanks for pointing this out! I think 'create' should be 0 in this case,
my test failed in this case, with both 4.6-rc6 stock kernel and my
patched kernel.

I'm testing an updated patch now, hopefully it's doing the right thing.
It's basiclly something like:

if (offset < i_size)
	create = 0;
else if ((block_in_file >> blkfactor) == (i_size >> (blkbits + blkfactor)) &&
	 (i_size & ((1 << (blkbits + blkfactor)) - 1)))
	create = 0;

So in addition to the (offset < i_size) check, it also checks that
block_in_file and i_size are falling into the same fs block && i_size is
not fs block size aligned.

> 
> Ted or Jan, can you answer that question?
> 
> > Also introduce some local variables to make the code
> > easier to read a little bit.
> 
> Please don't do this.  You're only making the change harder to review.
> Just submit the minimal fix.  You can submit cleanups as a follow-on.

I think it's not a pure cleanup, it's needed as things like
'sdio->block_in_file' are referenced multiple times in the function, and
they are making the lines too long to read/write. Maybe I should have
made it clear in the first place.

Thanks for the review!

Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html