linux-kernel - Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081112022701.GT10818@random.random>
Date:	Wed, 12 Nov 2008 03:27:01 +0100
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	Christoph Lameter <cl@...ux-foundation.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Izik Eidus <ieidus@...hat.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, kvm@...r.kernel.org, chrisw@...hat.com,
	avi@...hat.com, izike@...ranet.com
Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from
	one page into another

On Tue, Nov 11, 2008 at 06:27:09PM -0600, Christoph Lameter wrote:
> Then page migration will not occur because there is an unresolved
> reference.

So are you checking if there's an unresolved reference only in the
very place I just quoted in the previous email? If answer is yes: what
should prevent get_user_pages from running in parallel from another
thread? get_user_pages will trigger a minor fault and get the elevated
reference just after you read page_count. To you it looks like there
is no o_direct in progress when you proceed to the core of migration
code, but in effect o_direct just started a moment after you read the
page count.

What can protect you is PG lock or mmap_sem in _write_ mode (and
they've to be hold for the whole duration of the migration). I don't
see any of the two being hold while you read the page count... You
don't seem to be using stop_machine either (stop_machine pretty
expensive on the 4096 way I guess).

This wasn't reproduced in practice but it should be possible to
reproduce it by just writing a testcase with three threads, one forks
in a loop (child just quit) the other memset 0 the first 512bytes of a
page, and then o_direct read from a 0xff 512byte region and checks
that the first 512bytes are all non-zero in a loop, and the third
writes 1 byte to the last 512bytes of the page in a loop. Eventually
the comparison should show zero data in the page.

To reproduce with migration just start the thread that memset 0, reads
a 0xff region with o_direct, and checks it's all 0xff in a loop, and
then migrate the memory of this thread back and forth between two
nodes with the sys_move_pages (mpol is safe by luck because it
surrounds migrate_pages with the mmap_sem in write mode). Eventually
you should see zero bytes despite I/O is complete.

Reproducing this is normal life would take time and for the fork bug
it may not be reproducible depending of what the app is doing. Mixing
sys_move_pages with o_direct in the same process with on two different
threads, instead should eventually eventually reproduce it. And with
gup_fast is now unfixable until more infrastructure is added to
slowdown gup_fast a bit (unless Nick finds an RCU way of doing it).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/