[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu_ZYpey0dWYebFgCaziyJ-_x+KbCmOegWqFjwC0U-5QaA@mail.gmail.com>
Date: Fri, 7 Jun 2019 14:29:12 +0200
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Mike Rapoport <rppt@...ux.ibm.com>,
linux-efi <linux-efi@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Darren Hart <dvhart@...radead.org>,
Andy Shevchenko <andy@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
kbuild test robot <lkp@...el.com>,
Vishal L Verma <vishal.l.verma@...el.com>,
Linux-MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-nvdimm <linux-nvdimm@...ts.01.org>
Subject: Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory
for dax
On Sat, 1 Jun 2019 at 06:26, Dan Williams <dan.j.williams@...el.com> wrote:
>
> On Fri, May 31, 2019 at 8:30 AM Ard Biesheuvel
> <ard.biesheuvel@...aro.org> wrote:
> >
> > On Fri, 31 May 2019 at 17:28, Dan Williams <dan.j.williams@...el.com> wrote:
> > >
> > > On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel
> > > <ard.biesheuvel@...aro.org> wrote:
> > > >
> > > > (cc Mike for memblock)
> > > >
> > > > On Fri, 31 May 2019 at 01:13, Dan Williams <dan.j.williams@...el.com> wrote:
> > > > >
> > > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > > interpretation of the EFI Memory Types as "reserved for a special
> > > > > purpose".
> > > > >
> > > > > The proposed Linux behavior for specific purpose memory is that it is
> > > > > reserved for direct-access (device-dax) by default and not available for
> > > > > any kernel usage, not even as an OOM fallback. Later, through udev
> > > > > scripts or another init mechanism, these device-dax claimed ranges can
> > > > > be reconfigured and hot-added to the available System-RAM with a unique
> > > > > node identifier.
> > > > >
> > > > > This patch introduces 3 new concepts at once given the entanglement
> > > > > between early boot enumeration relative to memory that can optionally be
> > > > > reserved from the kernel page allocator by default. The new concepts
> > > > > are:
> > > > >
> > > > > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on
> > > > > EFI_CONVENTIONAL memory, update the E820 map with this new type. Only
> > > > > perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is
> > > > > enabled, otherwise treat it as typical ram.
> > > > >
> > > >
> > > > OK, so now we have 'special purpose', 'specific' and 'app specific'
> > > > [below]. Do they all mean the same thing?
> > >
> > > I struggled with separating the raw-EFI-type name from the name of the
> > > Linux specific policy. Since the reservation behavior is optional I
> > > was thinking there should be a distinct Linux kernel name for that
> > > policy. I did try to go back and change all occurrences of "special"
> > > to "specific" from the RFC to this v2, but seems I missed one.
> > >
> >
> > OK
>
> I'll go ahead and use "application reserved" terminology consistently
> throughout the code to distinguish that Linux translation from the raw
> "EFI specific purpose" attribute.
>
OK
> >
> > > >
> > > > > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for
> > > > > a device driver to search iomem resources for application specific
> > > > > memory. Teach the iomem code to identify such ranges as "Application
> > > > > Reserved".
> > > > >
> > > > > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the
> > > > > traditional System RAM pool the expectation is that they will have
> > > > > typical SRAT entries. In order to support a policy of device-dax by
> > > > > default with the option to hotplug later, the numa initialization code
> > > > > is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions.
> > > > >
> > > >
> > > > Can we move the generic memblock changes into a separate patch please?
> > >
> > > Yeah, that can move to a lead-in patch.
> > >
> > > [..]
> > > > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > > > index 91368f5ce114..b57b123cbdf9 100644
> > > > > --- a/include/linux/efi.h
> > > > > +++ b/include/linux/efi.h
> > > > > @@ -129,6 +129,19 @@ typedef struct {
> > > > > u64 attribute;
> > > > > } efi_memory_desc_t;
> > > > >
> > > > > +#ifdef CONFIG_EFI_SPECIFIC_DAX
> > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > > +{
> > > > > + return md->type == EFI_CONVENTIONAL_MEMORY
> > > > > + && (md->attribute & EFI_MEMORY_SP);
> > > > > +}
> > > > > +#else
> > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md)
> > > > > +{
> > > > > + return false;
> > > > > +}
> > > > > +#endif
> > > > > +
> > > > > typedef struct {
> > > > > efi_guid_t guid;
> > > > > u32 headersize;
> > > >
> > > > I'd prefer it if we could avoid this DAX policy distinction leaking
> > > > into the EFI layer.
> > > >
> > > > IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but
> > > > whether that is DAX memory or not should be decided in the DAX layer.
> > >
> > > Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to
> > > things that aren't EFI_CONVENTIONAL_MEMORY.
> >
> > Yes, that is fine. As long as the #ifdef lives in the DAX code and not here.
>
> We still need some ifdef in the efi core because that is the central
> location to make the policy distinction to identify identify
> EFI_CONVENTIONAL_MEMORY differently depending on whether EFI_MEMORY_SP
> is present. I agree with you that "dax" should be dropped from the
> naming. So how about:
>
> #ifdef CONFIG_EFI_APPLICATION_RESERVED
> static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> {
> return md->type == EFI_CONVENTIONAL_MEMORY
> && (md->attribute & EFI_MEMORY_SP);
> }
> #else
> static inline bool is_efi_application_reserved(efi_memory_desc_t *md)
> {
> return false;
> }
> #endif
I think this policy decision should not live inside the EFI subsystem.
EFI just gives you the memory map, and mangling that information
depending on whether you think a certain memory attribute should be
ignored is the job of the MM subsystem.
Powered by blists - more mailing lists