[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zw2dwG_rp7Hg-vIa@PC2K9PVX.TheFacebook.com>
Date: Mon, 14 Oct 2024 18:40:00 -0400
From: Gregory Price <gourry@...rry.net>
To: David Hildenbrand <david@...hat.com>
Cc: linux-cxl@...r.kernel.org, x86@...nel.org, linux-mm@...ck.org,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
osalvador@...e.de, gregkh@...uxfoundation.org, rafael@...nel.org,
akpm@...ux-foundation.org, dan.j.williams@...el.com,
Jonathan.Cameron@...wei.com, alison.schofield@...el.com,
rrichter@....com, terry.bowman@....com, lenb@...nel.org,
dave.jiang@...el.com, ira.weiny@...el.com
Subject: Re: [PATCH 1/3] memory: extern memory_block_size_bytes and
set_memory_block_size_order
On Mon, Oct 14, 2024 at 10:32:36PM +0200, David Hildenbrand wrote:
> On 14.10.24 16:25, Gregory Price wrote:
> > On Mon, Oct 14, 2024 at 01:54:27PM +0200, David Hildenbrand wrote:
> > > On 08.10.24 17:21, Gregory Price wrote:
> > > > On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
> > > > > On 08.10.24 16:51, Gregory Price wrote:
> > > > > > > > +int __weak set_memory_block_size_order(unsigned int order)
> > > > > > > > +{
> > > > > > > > + return -ENODEV;
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> > > > > > >
> > > > > > > I can understand what you are trying to achieve, but letting arbitrary
> > > > > > > modules mess with this sounds like a bad idea.
> > > > > > >
> > > > > >
> > > > > > I suppose the alternative is trying to scan the CEDT from inside each
> > > > > > machine, rather than the ACPI driver? Seems less maintainable.
> > > > > >
> > > > > > I don't entirely disagree with your comment. I hummed and hawwed over
> > > > > > externing this - hence the warning in the x86 machine.
> > > > > >
> > > > > > Open to better answers.
> > > > >
> > > > > Maybe an interface to add more restrictions on the maximum size might be
> > > > > better (instead of setting the size/order, you would impose another upper
> > > > > limit).
> > > >
> > > > That is effectively what set_memory_block_size_order is, though. Once
> > > > blocks are exposed to the allocators, its no longer safe to change the
> > > > size (in part because it was built assuming it wouldn't change, but I
> > > > imagine there are other dragons waiting in the shadows to bite me).
> > >
> > > Yes, we must run very early.
> > >
> > > How is this supposed to interact with code like
> > >
> > > set_block_size()
> > >
> > > that also calls set_memory_block_size_order() on UV systems (assuming there
> > > will be CXL support sooner or later?)?
> > >
> > >
> >
> > Tying the other email to this one - just clarifying the way forward here.
> >
> > It sounds like you're saying at a minimum drop EXPORT tags to prevent
> > modules from calling it - but it also sounds like built-ins need to be
> > prevented from touching it as well after a certain point in early boot.
>
> Right, at least the EXPORT is not required.
>
> >
> > Do you think I should go down the advise() path as suggested by Ira,
> > just adding a arch_lock_blocksize() bit and have set_..._order check it,
> > or should we just move towards each architecture having to go through
> > the ACPI:CEDT itself?
>
> Let's summarize what we currently have on x86 is:
>
> 1) probe_memory_block_size()
>
> Triggered on first memory_block_size_bytes() invocation. Makes a decision
> based on:
>
> a) Already set size using set_memory_block_size_order()
> b) RAM size
> c) Bare metal vs. virt (bare metal -> use max)
> d) Virt: largest block size aligned to memory end
>
>
> 2) set_memory_block_size_order()
>
> Triggered by set_block_size() on UV systems.
>
>
> I don't think set_memory_block_size_order() is the right tool to use. We
> just want to leave that alone I think -- it's a direct translation of a
> kernel cmdline parameter that should win.
>
> You essentially want to tweak the b)->d) logic to take other alignment into
> consideration.
>
> Maybe have some simple callback mechanism probe_memory_block_size() that can
> consult other sources for alignment requirements?
>
Thanks for this - I'll cobble something together.
Probably this ends up falling out similar to what Ira suggested.
drivers/acpi/numa/srat.c
acpi_numa_init():
order = parse_cfwm(...)
memblock_advise_size(order);
drivers/base/memory.c
static int memblock_size_order = 0; /* let arch choose */
int memblock_advise_size(order)
int old_order;
int new_order;
if (order <= 0)
return -EINVAL;
do {
old_order = memblock_size_order;
new_order = MIN(old_order, order);
} while (!atomic_cmpxchg(&memblock_size_order, old_order, new_order));
/* memblock_size_order is now <= order, if -1 then the probe won */
return new_order;
int memblock_probe_size()
return atomic_xchg(&memblock_size_order, -1);
drivers/base/memblock.h
#ifdef HOTPLUG
export memblock_advise_size()
export memblock_probe_size()
#else
static memblock_advice_size() { return -ENODEV; } /* always fail */
static memblock_probe_size() { return 0; } /* arch chooses */
#endif
arch/*/mm/...
probe_block_size():
memblock_probe_size();
/* select minimum across above suggested values */
> If that's not an option, then another way to set further min-alignment
> requirements (whereby we take MIN(old_align, new_align))?
>
> --
> Cheers,
>
> David / dhildenb
>
Powered by blists - more mailing lists