[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <465BCAA9.3070707@yahoo.com.au>
Date: Tue, 29 May 2007 16:39:37 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Neil Brown <neilb@...e.de>
CC: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Peter Linich <plinich@....unsw.edu.au>
Subject: Re: [PATCH/RFC] Is it OK for 'read' to return nuls for a file that
never had nuls in it?
Neil Brown wrote:
> [resending with correct To address - please reply to this one]
>
> It appears that there is a race when reading from a file that is
> concurrently being truncated. It is possible to read a number of
> bytes that matches the size of the file before the truncate, but the
> actual bytes are all nuls - values that had never been in the file.
>
> Below is a simple C program to demonstrate this, and a patch that
> might fix it (initial testing is positive, but it might just make the
> window smaller).
> To trial the program run two instances, naming the same file as the
> only argument. Every '!' indicates a read that found a nul. I get
> one every few minutes.
> e.g. cc -o race race.c ; ./race /tmp/testfile & ./race /tmp/tracefile
>
> My exploration suggests the problem is in do_generic_mapping_read in
> mm/filemap.c.
> This code:
> gets the size of the file
> triggers readahead
> gets the appropriate page
> If the page is up-to-date, return data.
>
> If a truncate happens just before readahead is triggered, then
> the size will be the pre-truncate size of the file, while the page
> could have been read by the readahead and so will be up-to-date and
> full of nuls.
>
> Note that if do_generic_mapping_read calls readpage explicitly, it
> samples the value of inode->i_size again after the read. However if
> the readpage is called by the readahead code, i_size is not
> re-sampled.
>
> I am not 100% confident of every aspect of this explanation (I haven't
> traced all the way through the read-ahead code) but it seems to fit
> the available data including the fact that if I disable read-ahead
> (blockdev --setra 0) then the apparent problem goes away.
>
> The patch below moves the code for re-sampling i_size from after the
> readpage call to before the "actor" call.
>
> Questions:
> - Is this a problem, and should it be fixed (I think "yes").
I think you are right.
> - Is the patch appropriate, and does it have no negative
> consequences?.
> (Obviously some comments should be tidied up to reflect the new
> reality).
Would it be better (and closer to following the existing logic) if
we sampled i_size before testing each page for uptodateness? It might
also cost a little less in the fastpath case of finding an uptodate
page.
--
SUSE Labs, Novell Inc.
View attachment "mm-ra-zerofill-fix.patch" of type "text/plain" (1236 bytes)
Powered by blists - more mailing lists