[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAmaweQoiBmo_igEzeKdsPmT-xzCtar36iNzaiFMEJB+w@mail.gmail.com>
Date: Wed, 20 Apr 2022 13:08:50 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: Tong Tiangen <tongtiangen@...wei.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>,
LKML <linux-kernel@...r.kernel.org>,
linux-mm <linux-mm@...ck.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
linux-riscv@...ts.infradead.org,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Guohanjun <guohanjun@...wei.com>
Subject: Re: [PATCH -next v4 3/4] arm64: mm: add support for page table check
On Wed, Apr 20, 2022 at 1:05 AM Anshuman Khandual
<anshuman.khandual@....com> wrote:
>
>
>
> On 4/19/22 18:49, Pasha Tatashin wrote:
> > On Tue, Apr 19, 2022 at 6:22 AM Anshuman Khandual
> > <anshuman.khandual@....com> wrote:
> >>
> >>
> >> On 4/18/22 09:14, Tong Tiangen wrote:
> >>> +#ifdef CONFIG_PAGE_TABLE_CHECK
> >>> +static inline bool pte_user_accessible_page(pte_t pte)
> >>> +{
> >>> + return pte_present(pte) && (pte_user(pte) || pte_user_exec(pte));
> >>> +}
> >>> +
> >>> +static inline bool pmd_user_accessible_page(pmd_t pmd)
> >>> +{
> >>> + return pmd_present(pmd) && (pmd_user(pmd) || pmd_user_exec(pmd));
> >>> +}
> >>> +
> >>> +static inline bool pud_user_accessible_page(pud_t pud)
> >>> +{
> >>> + return pud_present(pud) && pud_user(pud);
> >>> +}
> >>> +#endif
> >> Wondering why check for these page table entry states when init_mm
> >> has already being excluded ? Should not user page tables be checked
> >> for in entirety for all updates ? what is the rationale for filtering
> >> out only pxx_user_access_page entries ?
> >
> > The point is to prevent false sharing and memory corruption issues.
> > The idea of PTC to be simple and relatively independent from the MM
> > state machine that catches invalid page sharing. I.e. if an R/W anon
>
> Right, this mechanism here is truly interdependent validation, which is
> orthogonal to other MM states. Although I was curious, if mm_struct is
> not 'init_mm', what percentage of its total page table mapped entries
> will be user accessible ? These new helpers only filter out entries that
> could potentially create false sharing leading upto memory corruption ?
Yes, the intention is to filter out the false sharing scenarios.
Allows crashing the system prior to memory corruption or memory
leaking.
>
> I am wondering if there is any other way such filtering could have been
> applied without adding all these new page table helpers just for page
> table check purpose.
>
> > page is accessible by user land, that page can never be mapped into
> > another process (internally shared anons are treated as named
> > mappings).
>
> Right.
>
> >
> > Therefore, we try not to rely on MM states, and ensure that when a
> > page-table entry is accessible by user it meets the required
> > assumptions: no false sharing, etc.
>
> Right, filtering reduces the page table entries that needs interception
> during update (set/clear), but was just curious is there another way of
> doing it, without adding page table check specific helpers on platforms
> subscribing PAGE_TABLE_CHECK ?
>
It makes sense to limit the scope of PTC only to user accessible
pages, and not try to catch other bugs. This keeps it reasonably
small, and also lowers runtime overhead so it can be used in
production as well. IMO the extra helpers are not very intrusive, and
generic enough that in the future might be used elsewhere as well.
> >
> > For example, one bug that was caught with PTC was where a driver on an
> > unload would put memory on a freelist but memory is still mapped in
> > user page table.
>
> Should not page's refcount (that it is being used else where) prevented
> releases into free list ? But page table check here might just detect
> such scenarios even before page gets released.
Usually yes. However, there are a number of recent bugs related to
refcount [1][2][3]. This is why we need a stronger checker.
The particular bug, however, did not rely on refcount. The driver
allocated a kernel page for a ringbuffer, upon request shared it with
a userspace by mapping it into the user address space, and later when
the driver was unloaded, it never removed the mapping from the user
address space. Thus, even though the page was freed when the driver
was unloaded, the mapping stayed in the user page table.
[1] https://lore.kernel.org/all/xr9335nxwc5y.fsf@gthelen2.svl.corp.google.com
[2] https://lore.kernel.org/all/1582661774-30925-2-git-send-email-akaher@vmware.com
[3] https://lore.kernel.org/all/20210622021423.154662-3-mike.kravetz@oracle.com
Powered by blists - more mailing lists