[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f7667988-4d6c-461e-901d-a6c3612b2f0f@intel.com>
Date: Tue, 10 Aug 2021 13:50:57 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: "Kirill A. Shutemov" <kirill@...temov.name>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...nel.org>,
Sean Christopherson <seanjc@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Joerg Roedel <jroedel@...e.de>
Cc: Andi Kleen <ak@...ux.intel.com>,
Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@...ux.intel.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Tom Lendacky <thomas.lendacky@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Varad Gautam <varad.gautam@...e.com>,
Dario Faggioli <dfaggioli@...e.com>, x86@...nel.org,
linux-mm@...ck.org, linux-coco@...ts.linux.dev,
linux-kernel@...r.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH 1/5] mm: Add support for unaccepted memory
On 8/10/21 11:13 AM, Dave Hansen wrote:
>> @@ -1001,6 +1004,9 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone,
>> if (page_reported(page))
>> __ClearPageReported(page);
>>
>> + if (PageOffline(page))
>> + clear_page_offline(page, order);
>> +
>> list_del(&page->lru);
>> __ClearPageBuddy(page);
>> set_page_private(page, 0);
> So, this is right in the fast path of the page allocator. It's a
> one-time thing per 2M page, so it's not permanent.
>
> *But* there's both a global spinlock and a firmware call hidden in
> clear_page_offline(). That's *GOT* to hurt if you were, for instance,
> running a benchmark while this code path is being tickled. Not just to
>
> That could be just downright catastrophic for scalability, albeit
> temporarily.
One more thing...
How long are these calls? You have to make at least 512 calls into the
SEAM module. Assuming they're syscall-ish, so ~1,000 cycles each,
that's ~500,000 cycles, even if we ignore the actual time it takes to
zero that 2MB worth of memory and all other overhead within the SEAM module.
So, we're sitting on one CPU with interrupts off, blocking all the other
CPUs from doing page allocation in this zone. Then, we're holding a
global lock which prevents any other NUMA nodes from accepting pages.
If the other node happens to *try* to do an accept, it will sit with its
zone lock held waiting for this one.
Maybe nobody will ever notice. But, it seems like an awfully big risk
to me. I'd at least *try* do these calls outside of the zone lock.
Then the collateral damage will at least be limited to things doing
accepts rather than all zone->lock users.
Couldn't we delay the acceptance to, say the place where we've dropped
the zone->lock and do the __GFP_ZERO memset() like at prep_new_page()?
Or is there some concern that the page has been split at that point?
I guess that makes it more complicated because you might have a 4k page
but you need to go accept a 2M page. You might end up having to check
the bitmap 511 more times because you might see 511 more PageOffline()
pages come through.
You shouldn't even need the bitmap lock to read since it's a one-way
trip from unaccepted->accepted.
Powered by blists - more mailing lists