[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120501235113.GC22923@redhat.com>
Date: Wed, 2 May 2012 01:51:13 +0200
From: Andrea Arcangeli <aarcange@...hat.com>
To: Nick Piggin <npiggin@...il.com>
Cc: Jeff Moyer <jmoyer@...hat.com>,
KOSAKI Motohiro <kosaki.motohiro@...il.com>,
Jan Kara <jack@...e.cz>,
Michael Kerrisk <mtk.manpages@...il.com>,
LKML <linux-kernel@...r.kernel.org>, linux-man@...r.kernel.org,
linux-mm@...ck.org, mgorman@...e.de, Woodman <lwoodman@...hat.com>
Subject: Re: [PATCH] Describe race of direct read and fork for unaligned
buffers
Hi Nick!
On Wed, May 02, 2012 at 01:50:46AM +1000, Nick Piggin wrote:
> KOSAKI-san is correct, I think.
>
> The race is something like this:
>
> DIO-read
> page = get_user_pages()
> fork()
> COW(page)
> touch(page)
> DMA(page)
> page_cache_release(page);
Yes. More in general this race happens every time the kernel wrprotect
a writable anon pte, if get_user_pages had a pin on the page while the
pte is being wrprotected.
fork can't just abort (like KSM does) when it notices mapcount <
page_count.
The only way to avoid this, is that somehow the GUP-pinned page should
remain pointed at all times by the pte of the process that pinned the
page (no matter the cows), and that's not happening.
> So whether parent or child touches the page, determines who gets the
> actual DMA target, and who gets the copy.
Correct, so far there are two reproducers, triggering two different
kind of corruption.
The corruption may appear in different ways:
1) we could lose the direct-io read in the parent (if the forked child
does nothing and just quits), that was the basic case in dma_thread.c,
a dummy fork was run just to mark the pte wrprotected
2) the destination of the direct-io read may also become visible to the
child if the child written to the page before the I/O is complete,
leading to random mm corruption in the child
3) it's a direct-io write, then the child could write random data to
disk by accident without noticing, if the DMA wasn't started yet and
the child got the pinned page mapped in the child pte
We had two working fixes for this and personally I'd prefer to apply
them than to document the bug. The probability that who writes code
that can hit the bug is reading the note in the manpage seems pretty
small, especially in the short/mid term. This lkml thread as reminder
may actually have higher chance of being noticed than the manpage
maybe. Nevertheless documenting it is better than nothing if the fixes
aren't applied :). However I'm afraid after we officially document it
the chances of fixing it becomes zero.
> 2 threads are not required, but it makes the race easier to code and a
> larger window, I suspect.
>
> It can also be hit with a single thread, using AIO.
Yes, it requires running fork in the same process that pinned a page
with GUP, and then writing to a buffer in the same page that is under
the GUP pin before the GUP pin is released.
It's not just direct-io, and not just direct-io read (see point 3).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists