[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADrL8HUwxLU-UvTLbzp-JM5EqQ2u-91UU4VfAhRrPiu7i3Jhkg@mail.gmail.com>
Date: Thu, 29 Jun 2023 21:50:55 -0400
From: James Houghton <jthoughton@...gle.com>
To: John Hubbard <jhubbard@...dia.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
Adrian Hunter <adrian.hunter@...el.com>,
Al Viro <viro@...iv.linux.org.uk>,
Alex Williamson <alex.williamson@...hat.com>,
Alexander Potapenko <glider@...gle.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Andrey Ryabinin <ryabinin.a.a@...il.com>,
Christian Brauner <brauner@...nel.org>,
Christoph Hellwig <hch@...radead.org>,
Daniel Vetter <daniel@...ll.ch>,
Dave Airlie <airlied@...il.com>,
Dimitri Sivanich <dimitri.sivanich@....com>,
Dmitry Vyukov <dvyukov@...gle.com>,
Ian Rogers <irogers@...gle.com>,
Jason Gunthorpe <jgg@...pe.ca>, Jiri Olsa <jolsa@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Lorenzo Stoakes <lstoakes@...il.com>,
Mark Rutland <mark.rutland@....com>,
Matthew Wilcox <willy@...radead.org>,
Miaohe Lin <linmiaohe@...wei.com>,
Michal Hocko <mhocko@...nel.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Namhyung Kim <namhyung@...nel.org>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
Pavel Tatashin <pasha.tatashin@...een.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Ryan Roberts <ryan.roberts@....com>,
SeongJae Park <sj@...nel.org>,
Shakeel Butt <shakeelb@...gle.com>,
Uladzislau Rezki <urezki@...il.com>,
Vincenzo Frascino <vincenzo.frascino@....com>,
Yu Zhao <yuzhao@...gle.com>
Subject: Re: [PATCH] mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
On Thu, Jun 29, 2023 at 9:32 PM John Hubbard <jhubbard@...dia.com> wrote:
>
> The following crash happens for me when running the -mm selftests
> (below). Specifically, it happens while running the uffd-stress
> subtests:
>
> kernel BUG at mm/hugetlb.c:7249!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
> Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
> RIP: 0010:huge_pte_alloc+0x12c/0x1a0
> ...
> Call Trace:
> <TASK>
> ? __die_body+0x63/0xb0
> ? die+0x9f/0xc0
> ? do_trap+0xab/0x180
> ? huge_pte_alloc+0x12c/0x1a0
> ? do_error_trap+0xc6/0x110
> ? huge_pte_alloc+0x12c/0x1a0
> ? handle_invalid_op+0x2c/0x40
> ? huge_pte_alloc+0x12c/0x1a0
> ? exc_invalid_op+0x33/0x50
> ? asm_exc_invalid_op+0x16/0x20
> ? __pfx_put_prev_task_idle+0x10/0x10
> ? huge_pte_alloc+0x12c/0x1a0
> hugetlb_fault+0x1a3/0x1120
> ? finish_task_switch+0xb3/0x2a0
> ? lock_is_held_type+0xdb/0x150
> handle_mm_fault+0xb8a/0xd40
> ? find_vma+0x5d/0xa0
> do_user_addr_fault+0x257/0x5d0
> exc_page_fault+0x7b/0x1f0
> asm_exc_page_fault+0x22/0x30
>
> That happens because a BUG() statement in huge_pte_alloc() attempts to
> check that a pte, if present, is a hugetlb pte, but it does so in a
> non-lockless-safe manner that leads to a false BUG() report.
>
> We got here due to a couple of bugs, each of which by itself was not
> quite enough to cause a problem:
>
> First of all, before commit c33c794828f2("mm: ptep_get() conversion"),
> the BUG() statement in huge_pte_alloc() was itself fragile: it relied
> upon compiler behavior to only read the pte once, despite using it twice
> in the same conditional.
>
> Next, commit c33c794828f2 ("mm: ptep_get() conversion") broke that
> delicate situation, by causing all direct pte reads to be done via
> READ_ONCE(). And so READ_ONCE() got called twice within the same BUG()
> conditional, leading to comparing (potentially, occasionally) different
> versions of the pte, and thus to false BUG() reports.
>
> Fix this by taking a single snapshot of the pte before using it in the
> BUG conditional.
>
> Now, that commit is only partially to blame here but, people doing
> bisections will invariably land there, so this will help them find a fix
> for a real crash. And also, the previous behavior was unlikely to ever
> expose this bug--it was fragile, yet not actually broken.
>
> So that's why I chose this commit for the Fixes tag, rather than the
> commit that created the original BUG() statement.
>
> Fixes: c33c794828f2 ("mm: ptep_get() conversion")
Hi John,
Good catch, and thanks for the detailed explanation. It looks like
riscv and powerpc have equivalent problems in their huge_pte_alloc
implementations, perhaps it's worth taking a look at those. (riscv
looks like it has precisely the same problem except it's a WARN, but
powerpc looks more interesting.)
Either way,
Acked-by: James Houghton <jthoughton@...gle.com>
Powered by blists - more mailing lists