linux-kernel - Re: [PATCH v8 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2a6311c-7fdc-4d12-9a3f-d2eed954c468@lucifer.local>
Date:   Thu, 4 May 2023 16:48:20 +0100
From:   Lorenzo Stoakes <lstoakes@...il.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jason Gunthorpe <jgg@...pe.ca>, Jens Axboe <axboe@...nel.dk>,
        Matthew Wilcox <willy@...radead.org>,
        Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
        Leon Romanovsky <leon@...nel.org>,
        Christian Benvenuti <benve@...co.com>,
        Nelson Escobar <neescoba@...co.com>,
        Bernard Metzler <bmt@...ich.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Bjorn Topel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Christian Brauner <brauner@...nel.org>,
        Richard Cochran <richardcochran@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        linux-fsdevel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        netdev@...r.kernel.org, bpf@...r.kernel.org,
        Oleg Nesterov <oleg@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        John Hubbard <jhubbard@...dia.com>, Jan Kara <jack@...e.cz>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Pavel Begunkov <asml.silence@...il.com>,
        Mika Penttila <mpenttil@...hat.com>,
        Dave Chinner <david@...morbit.com>,
        Theodore Ts'o <tytso@....edu>, Peter Xu <peterx@...hat.com>,
        Matthew Rosato <mjrosato@...ux.ibm.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>
Subject: Re: [PATCH v8 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing
 to file-backed mappings

On Thu, May 04, 2023 at 05:04:34PM +0200, David Hildenbrand wrote:
> [...]
>
> > +static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
> > +{
> > +	struct address_space *mapping;
> > +	unsigned long mapping_flags;
> > +
> > +	/*
> > +	 * If we aren't pinning then no problematic write can occur. A long term
> > +	 * pin is the most egregious case so this is the one we disallow.
> > +	 */
> > +	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
> > +	    (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
> > +		return true;
> > +
> > +	/* The folio is pinned, so we can safely access folio fields. */
> > +
> > +	/* Neither of these should be possible, but check to be sure. */
>
> You can easily have anon pages that are at the swapcache at this point
> (especially, because this function is called before our unsharing checks),
> the comment is misleading.

Ack will update.

>
> And there is nothing wrong about pinning an anon page that's still in the
> swapcache. The following folio_test_anon() check will allow them.
>
> The check made sense in page_mapping(), but here it's not required.

Waaaaaaaaaait a second, you were saying before:-

  "Folios in the swap cache return the swap mapping" -- you might disallow
  pinning anonymous pages that are in the swap cache.

  I recall that there are corner cases where we can end up with an anon
  page that's mapped writable but still in the swap cache ... so you'd
  fallback to the GUP slow path (acceptable for these corner cases, I
  guess), however especially the comment is a bit misleading then.

So are we allowing or disallowing pinning anon swap cache pages? :P

I mean slow path would allow them if they are just marked anon so I'm inclined
to allow them.

>
> I do agree regarding folio_test_slab(), though. Should we WARN in case we
> would have one?
>
> if (WARN_ON_ONCE(folio_test_slab(folio)))
> 	return false;
>

God help us if we have a slab page at this point, so agreed worth doing, it
would surely have to arise from some dreadful bug/memory corruption.

> > +	if (unlikely(folio_test_slab(folio) || folio_test_swapcache(folio)))
> > +		return false;
> > +
> > +	/* hugetlb mappings do not require dirty-tracking. */
> > +	if (folio_test_hugetlb(folio))
> > +		return true;
> > +
> > +	/*
> > +	 * GUP-fast disables IRQs. When IRQS are disabled, RCU grace periods
> > +	 * cannot proceed, which means no actions performed under RCU can
> > +	 * proceed either.
> > +	 *
> > +	 * inodes and thus their mappings are freed under RCU, which means the
> > +	 * mapping cannot be freed beneath us and thus we can safely dereference
> > +	 * it.
> > +	 */
> > +	lockdep_assert_irqs_disabled();
> > +
> > +	/*
> > +	 * However, there may be operations which _alter_ the mapping, so ensure
> > +	 * we read it once and only once.
> > +	 */
> > +	mapping = READ_ONCE(folio->mapping);
> > +
> > +	/*
> > +	 * The mapping may have been truncated, in any case we cannot determine
> > +	 * if this mapping is safe - fall back to slow path to determine how to
> > +	 * proceed.
> > +	 */
> > +	if (!mapping)
> > +		return false;
> > +
> > +	/* Anonymous folios are fine, other non-file backed cases are not. */
> > +	mapping_flags = (unsigned long)mapping & PAGE_MAPPING_FLAGS;
> > +	if (mapping_flags)
> > +		return mapping_flags == PAGE_MAPPING_ANON;
>
> KSM pages are also (shared) anonymous folios, and that check would fail --
> which is ok (the following unsharing checks rejects long-term pinning them),
> but a bit inconstent with your comment and folio_test_anon().
>
> It would be more consistent (with your comment and also the folio_test_anon
> implementation) to have here:
>
> 	return mapping_flags & PAGE_MAPPING_ANON;
>

I explicitly excluded KSM out of fear that could be some breakage given they're
wrprotect'd + expected to CoW though? But I guess you mean they'd get picked up
by the unshare and so it doesn't matter + we wouldn't want to exclude an
PG_anon_exclusive case?

I'll make the change in any case given the unshare check!

I notice the gup_huge_pgd() doesn't do an unshare but I mean, a PGD-sized huge
page probably isn't going to be CoW'd :P


> > +
> > +	/*
> > +	 * At this point, we know the mapping is non-null and points to an
> > +	 * address_space object. The only remaining whitelisted file system is
> > +	 * shmem.
> > +	 */
> > +	return shmem_mapping(mapping);
> > +}
> > +
>
> In general, LGTM
>
> Acked-by: David Hildenbrand <david@...hat.com>
>

Thanks!

Will respin, addressing your comments and addressing the issue the kernel
bot picked up with placement in the appropriate #ifdef's and send out a v9
shortly.


> --
> Thanks,
>
> David / dhildenb
>