[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220121085917.GA22849@worktop.programming.kicks-ass.net>
Date: Fri, 21 Jan 2022 09:59:17 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: David Hildenbrand <david@...hat.com>
Cc: mingo@...hat.com, tglx@...utronix.de, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-api@...r.kernel.org, x86@...nel.org,
pjt@...gle.com, posk@...gle.com, avagin@...gle.com,
jannh@...gle.com, tdelisle@...terloo.ca, mark.rutland@....com,
posk@...k.io
Subject: Re: [RFC][PATCH v2 1/5] mm: Avoid unmapping pinned pages
On Fri, Jan 21, 2022 at 08:51:57AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 20, 2022 at 07:25:08PM +0100, David Hildenbrand wrote:
> > On 20.01.22 16:55, Peter Zijlstra wrote:
> > > Add a guarantee for Anon pages that pin_user_page*() ensures the
> > > user-mapping of these pages stay preserved. In order to ensure this
> > > all rmap users have been audited:
> > >
> > > vmscan: already fails eviction due to page_maybe_dma_pinned()
> > >
> > > migrate: migration will fail on pinned pages due to
> > > expected_page_refs() not matching, however that is
> > > *after* try_to_migrate() has already destroyed the
> > > user mapping of these pages. Add an early exit for
> > > this case.
> > >
> > > numa-balance: as per the above, pinned pages cannot be migrated,
> > > however numa balancing scanning will happily PROT_NONE
> > > them to get usage information on these pages. Avoid
> > > this for pinned pages.
> >
> > page_maybe_dma_pinned() can race with GUP-fast without
> > mm->write_protect_seq. This is a real problem for vmscan() with
> > concurrent GUP-fast as it can result in R/O mappings of pinned pages and
> > GUP will lose synchronicity to the page table on write faults due to
> > wrong COW.
>
> Urgh, so yeah, that might be a problem. Follow up code uses it like
> this:
>
> +/*
> + * Pinning a page inhibits rmap based unmap for Anon pages. Doing a load
> + * through the user mapping ensures the user mapping exists.
> + */
> +#define umcg_pin_and_load(_self, _pagep, _member) \
> +({ \
> + __label__ __out; \
> + int __ret = -EFAULT; \
> + \
> + if (pin_user_pages_fast((unsigned long)(_self), 1, 0, &(_pagep)) != 1) \
> + goto __out; \
> + \
> + if (!PageAnon(_pagep) || \
> + get_user(_member, &(_self)->_member)) { \
> + unpin_user_page(_pagep); \
> + goto __out; \
> + } \
> + __ret = 0; \
> +__out: __ret; \
> +})
>
> And after that hard assumes (on the penalty of SIGKILL) that direct user
> access works. Specifically it does RmW ops on it. So I suppose I'd
> better upgrade that load to a RmW at the very least.
>
> But is that sufficient? Let me go find that race you mention...
OK, so copy_page_range() vs lockless_pages_from_mm(). Since I use
FOLL_PIN that should be sorted, it'll fall back the slow path and use
mmap_sem and serialize against the fork().
(Also, can I express my hate for __gup_longterm_unlocked(), that
function name is utter garbage)
However, I'm not quite sure what fork() does with pages that have a pin.
There's been a number of GUP vs fork() problems over the years, but I'm
afraid I have lost track of that and I can't quickly find anything in
the code..
Naively, a page that has async DMA activity should not be CoW'ed, or if
it is, care must be taken to ensure the original pages stays in the
original process, but I realize that's somewhat hard.
Let me dig in a bit more.
Powered by blists - more mailing lists