[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXEVf1m4hVXORc6t9ytAOb75KZLcW-OJ6999VaKbkVdQ3A@mail.gmail.com>
Date: Tue, 13 Feb 2024 14:33:34 +0100
From: Ard Biesheuvel <ardb@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: David Hildenbrand <david@...hat.com>, Mark Rutland <mark.rutland@....com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, Marc Zyngier <maz@...nel.org>,
James Morse <james.morse@....com>, Andrey Ryabinin <ryabinin.a.a@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>,
Kefeng Wang <wangkefeng.wang@...wei.com>, John Hubbard <jhubbard@...dia.com>,
Zi Yan <ziy@...dia.com>, Barry Song <21cnbao@...il.com>, Alistair Popple <apopple@...dia.com>,
Yang Shi <shy828301@...il.com>, Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>, "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
"Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
linux-arm-kernel@...ts.infradead.org, x86@...nel.org,
linuxppc-dev@...ts.ozlabs.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 19/25] arm64/mm: Wire up PTE_CONT for user mappings
On Tue, 13 Feb 2024 at 14:21, Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 13/02/2024 13:13, David Hildenbrand wrote:
> > On 13.02.24 14:06, Ryan Roberts wrote:
> >> On 13/02/2024 12:19, David Hildenbrand wrote:
> >>> On 13.02.24 13:06, Ryan Roberts wrote:
> >>>> On 12/02/2024 20:38, Ryan Roberts wrote:
> >>>>> [...]
> >>>>>
> >>>>>>>>> +static inline bool mm_is_user(struct mm_struct *mm)
> >>>>>>>>> +{
> >>>>>>>>> + /*
> >>>>>>>>> + * Don't attempt to apply the contig bit to kernel mappings, because
> >>>>>>>>> + * dynamically adding/removing the contig bit can cause page faults.
> >>>>>>>>> + * These racing faults are ok for user space, since they get
> >>>>>>>>> serialized
> >>>>>>>>> + * on the PTL. But kernel mappings can't tolerate faults.
> >>>>>>>>> + */
> >>>>>>>>> + return mm != &init_mm;
> >>>>>>>>> +}
> >>>>>>>>
> >>>>>>>> We also have the efi_mm as a non-user mm, though I don't think we
> >>>>>>>> manipulate
> >>>>>>>> that while it is live, and I'm not sure if that needs any special handling.
> >>>>>>>
> >>>>>>> Well we never need this function in the hot (order-0 folio) path, so I
> >>>>>>> think I
> >>>>>>> could add a check for efi_mm here with performance implication. It's
> >>>>>>> probably
> >>>>>>> safest to explicitly exclude it? What do you think?
> >>>>>>
> >>>>>> Oops: This should have read "I think I could add a check for efi_mm here
> >>>>>> *without* performance implication"
> >>>>>
> >>>>> It turns out that efi_mm is only defined when CONFIG_EFI is enabled I can do
> >>>>> this:
> >>>>>
> >>>>> return mm != &init_mm && (!IS_ENABLED(CONFIG_EFI) || mm != &efi_mm);
> >>>>>
> >>>>> Is that acceptable? This is my preference, but nothing else outside of efi
> >>>>> references this symbol currently.
> >>>>>
> >>>>> Or perhaps I can convince myself that its safe to treat efi_mm like userspace.
> >>>>> There are a couple of things that need to be garanteed for it to be safe:
> >>>>>
> >>>>> - The PFNs of present ptes either need to have an associated struct
> >>>>> page or
> >>>>> need to have the PTE_SPECIAL bit set (either pte_mkspecial() or
> >>>>> pte_mkdevmap())
> >>>>>
> >>>>> - Live mappings must either be static (no changes that could cause
> >>>>> fold/unfold
> >>>>> while live) or the system must be able to tolerate a temporary fault
> >>>>>
> >>>>> Mark suggests efi_mm is not manipulated while live, so that meets the latter
> >>>>> requirement, but I'm not sure about the former?
> >>>>
> >>>> I've gone through all the efi code, and conclude that, as Mark suggests, the
> >>>> mappings are indeed static. And additionally, the ptes are populated using only
> >>>> the _private_ ptep API, so there is no issue here. As just discussed with Mark,
> >>>> my prefereence is to not make any changes to code, and just add a comment
> >>>> describing why efi_mm is safe.
> >>>>
> >>>> Details:
> >>>>
> >>>> * Registered with ptdump
> >>>> * ptep_get_lockless()
> >>>> * efi_create_mapping -> create_pgd_mapping … -> init_pte:
> >>>> * __ptep_get()
> >>>> * __set_pte()
> >>>> * efi_memattr_apply_permissions -> efi_set_mapping_permissions … ->
> >>>> set_permissions
> >>>> * __ptep_get()
> >>>> * __set_pte()
> >>>
> >>> Sound good. We could add some VM_WARN_ON if we ever get the efi_mm via the
> >>> "official" APIs.
> >>
> >> We could, but that would lead to the same linkage issue, which I'm trying to
> >> avoid in the first place:
> >>
> >> VM_WARN_ON(IS_ENABLED(CONFIG_EFI) && mm == efi_mm);
> >>
> >> This creates new source code dependencies, which I would rather avoid if
> >> possible.
> >
> > Just a thought, you could have a is_efi_mm() function that abstracts all that.
> >
> > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > index c74f47711f0b..152f5fa66a2a 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -692,6 +692,15 @@ extern struct efi {
> >
> > extern struct mm_struct efi_mm;
> >
> > +static inline void is_efi_mm(struct mm_struct *mm)
> > +{
> > +#ifdef CONFIG_EFI
> > + return mm == &efi_mm;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > static inline int
> > efi_guidcmp (efi_guid_t left, efi_guid_t right)
> > {
> >
> >
>
> That would definitely work, but in that case, I might as well just check for it
> in mm_is_user() (and personally I would change the name to mm_is_efi()):
>
>
> static inline bool mm_is_user(struct mm_struct *mm)
> {
> return mm != &init_mm && !mm_is_efi(mm);
> }
>
> Any objections?
>
Any reason not to use IS_ENABLED(CONFIG_EFI) in the above? The extern
declaration is visible to the compiler, and any references should
disappear before the linker could notice that efi_mm does not exist.
In any case, feel free to add
Acked-by: Ard Biesheuvel <ardb@...nel.org>
when you roll a patch based on the above, with or without IS_ENABLED().
And as you concluded, efi_mm is indeed set in stone once the runtime
regions described by the firmware have been mapped, although this may
happen in two passes depending on how the runtime regions are
described. But by the time user MMs might exist, efi_mm should
effectively be immutable.
Powered by blists - more mailing lists