linux-kernel - Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file that never had nuls in it?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 29 May 2007 16:39:37 +1000
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Neil Brown <neilb@...e.de>
CC:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Peter Linich <plinich@....unsw.edu.au>
Subject: Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file   that
 never had nuls in it?

Neil Brown wrote:
> [resending with correct To address - please reply to this one]
> 
> It appears that there is a race when reading from a file that is
> concurrently being truncated.  It is possible to read a number of
> bytes that matches the size of the file before the truncate, but the
> actual bytes are all nuls - values that had never been in the file.
> 
> Below is a simple C program to demonstrate this, and a patch that
> might fix it (initial testing is positive, but it might just make the
> window smaller).
> To trial the program run two instances, naming the same file as the
> only argument.  Every '!' indicates a read that found a nul.  I get
> one every few minutes.
> e.g.  cc -o race race.c ; ./race /tmp/testfile & ./race /tmp/tracefile
> 
> My exploration suggests the problem is in do_generic_mapping_read in
> mm/filemap.c. 
> This code:
>    gets the size of the file
>    triggers readahead
>    gets the appropriate page
>    If the page is up-to-date, return data.
> 
> If a truncate happens just before readahead is triggered, then
> the size will be the pre-truncate size of the file, while the page
> could have been read by the readahead and so will be up-to-date and
> full of nuls.
> 
> Note that if do_generic_mapping_read calls readpage explicitly, it
> samples the value of inode->i_size again after the read.  However if
> the readpage is called by the readahead code, i_size is not
> re-sampled.
> 
> I am not 100% confident of every aspect of this explanation (I haven't
> traced all the way through the read-ahead code) but it seems to fit
> the available data including the fact that if I disable read-ahead
> (blockdev --setra 0) then the apparent problem goes away.
> 
> The patch below moves the code for re-sampling i_size from after the
> readpage call to before the "actor" call.
> 
> Questions:
>   - Is this a problem, and should it be fixed (I think "yes").

I think you are right.

>   - Is the patch appropriate, and does it have no negative
>     consequences?.
>     (Obviously some comments should be tidied up to reflect the new
>     reality).

Would it be better (and closer to following the existing logic) if
we sampled i_size before testing each page for uptodateness? It might
also cost a little less in the fastpath case of finding an uptodate
page.

-- 
SUSE Labs, Novell Inc.

View attachment "mm-ra-zerofill-fix.patch" of type "text/plain" (1236 bytes)