[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZUUcQexwj90u+Mll@swahl-home.5wahls.com>
Date: Fri, 3 Nov 2023 11:13:53 -0500
From: Steve Wahl <steve.wahl@....com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Steve Wahl <steve.wahl@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mm/ident_map: Use gbpages only where full GB page
should be mapped.
Dave,
Thank you for taking the time to review my patch. (More below.)
On Thu, Nov 02, 2023 at 09:02:44AM -0700, Dave Hansen wrote:
> On 10/31/23 12:50, Steve Wahl wrote:
> > Instead of using gbpages for all memory regions, use them only when
> > map creation requests include the full GB page of space; descend to
> > using smaller 2M pages when only portions of a GB page are requested.
> ...
>
> The logic here is sound: we obviously don't want systems rebooting
> because speculation went wonky.
>
> > diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> > index 968d7005f4a7..b63a1ffcfe9f 100644
> > --- a/arch/x86/mm/ident_map.c
> > +++ b/arch/x86/mm/ident_map.c
> > @@ -31,18 +31,26 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
> > if (next > end)
> > next = end;
> >
> > - if (info->direct_gbpages) {
> > + /*
> > + * if gbpages allowed, this entry not yet present, and
> > + * the full gbpage range is requested (both ends are
> > + * correctly aligned), create a gbpage.
> > + */
> > + if (info->direct_gbpages
> > + && !pud_present(*pud)
> > + && !(addr & ~PUD_MASK)
> > + && !(next & ~PUD_MASK)) {
>
> This is a _bit_ too subtle for my taste.
>
> Let's say someone asks for mapping of 2MB@5G, then later asks for 1G@5G.
> The first call in here will do the 2MB mapping with small (pud)
> entries. The second call will see the new pud_present() check and
> *also* skip large pud creation.
>
> That's a regression. It might not matter _much_, but it's a place where
> the old code creates large PUDs and the new code creates small ones.
> It's the kind of behavior change that at least needs to be noted in the
> changelog.
I will add a note that requests that create a small page mapping will
have that small mapping persist in this range, even if subsequent
requests enlarge the mapped area to cover the whole 1G segment.
> Off the top of my head, I can't think of why we'd get overlapping
> requests in here, though. Did you think through that case? Is it common?
Yes, I had thought about the overlaps. Of the choices I had here,
continuing to use the already allocated PMD page and fill the map in
with 2M pages seemed the best option.
Existing usage (the kernel decompression stub and kexec) start with
huge tracts of memory and then add smaller pieces that may or may not
already reside in the map created so far. (See, for example, the
comment around line 231 in arch/x86/kernel/machine_kexec_64.c.)
My early private versions with printks reflected this, but this was
limited to testing on UV systems.
In short, with current usage overlap is expected, but it would be rare
for small pieces that requrie PMD mapping to be followed by large
pieces that include the whole PUD level region.
> > pud_t pudval;
> >
> > - if (pud_present(*pud))
> > - continue;
> > -
> > - addr &= PUD_MASK;
> > pudval = __pud((addr - info->offset) | info->page_flag);
> > set_pud(pud, pudval);
> > continue;
> > }
> >
> > + /* if this is already a gbpage, this portion is already mapped */
> > + if (pud_large(*pud))
> > + continue;
>
> I'd probably move this check up to above the large PUD creation code.
> It would make it more obvious that any PUD that's encountered down below
> is a small PUD.
That makes sense. I will change this.
> > if (pud_present(*pud)) {
> > pmd = pmd_offset(pud, 0);
> > ident_pmd_init(info, pmd, addr, next);
>
> That would end up looking something like this:
>
> bool do_gbpages = true;
> ...
>
> // Is the whole range already mapped?
> if (pud_large(*pud))
> continue;
>
> /* PUD is either empty or small */
>
> // GB pages allowed?
> do_gbpages &= info->direct_gbpages;
> // Addresses aligned for GB pages?
> do_gbpages &= ~(addr & ~PUD_MASK);
> do_gbpages &= ~(next & ~PUD_MASK);
> // Check for existing mapping using a small PUD
> do_gbpages &= !pud_present(*pud);
>
> if (do_gbpages) {
> set_pud(pud, __pud((addr - info->offset) |
> info->page_flag));
> continue
> }
I tried coding it up with the bool instead of the single if statement,
and to me it did not look as easy to read and understand as the single
if statement version. So unless you firmly object, I'm leaving it the
original way, but improving the comment above the if statement to have
one-for-one explanations for each condition.
Thanks,
--> Steve Wahl
--
Steve Wahl, Hewlett Packard Enterprise
Powered by blists - more mailing lists