lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260206110305.00001fbb@huawei.com>
Date: Fri, 6 Feb 2026 11:03:05 +0000
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Gregory Price <gourry@...rry.net>
CC: Andrew Morton <akpm@...ux-foundation.org>, Cui Chao
	<cuichao1753@...tium.com.cn>, <dan.j.williams@...el.com>, Mike Rapoport
	<rppt@...nel.org>, Wang Yinfeng <wangyinfeng@...tium.com.cn>,
	<linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<linux-mm@...ck.org>, <qemu-devel@...gnu.org>
Subject: Re: [PATCH v2 1/1] mm: numa_memblks: Identify the accurate NUMA ID
 of CFMW

On Thu, 5 Feb 2026 18:10:55 -0500
Gregory Price <gourry@...rry.net> wrote:

> On Thu, Feb 05, 2026 at 02:58:42PM -0800, Andrew Morton wrote:
> > On Mon, 26 Jan 2026 17:06:52 +0800 Cui Chao <cuichao1753@...tium.com.cn> wrote:
> >   
> > > > All that said, this does look harmless, and seems reasonable - but the
> > > > changelog should reflect what the hardware is doing above.  
> > > This issue was discovered on the QEMU platform. I need to apologize for 
> > > my earlier imprecise statement (claiming it was hardware instead of 
> > > QEMU). My core point at the time was to emphasize that this is a problem 
> > > in the general code path when facing this scenario, not a QEMU-specific 
> > > emulation issue, and therefore it could theoretically affect real 
> > > hardware as well. I apologize for any confusion this may have caused.  
> > 
> > This patch doesn't sounds very urgent.  Perhaps we should do a v3 with
> > updated changelog and handle that in the next -rc cycle?  
> 
> Mostly QEMU just needs to add SRAT entries associated with the
> CEDT/CFMWS it adds.

HI Gregory,

I got a bit carried away - but the following basically says: No QEMU should not
add SRAT Memory Affinity Structures. As to Andrew's question: I'm fine with this
fix taking a little longer.

I disagree. There is nothing in the specification to say it should do that and
we have very intentionally not done so in QEMU - this is far from the first
time this has come up!. We won't be doing so any time soon unless someone
convinces me with clear spec references and tight reasoning for why it is the
right thing to do.

The only time providing SRAT Memory Affinity Structures for CEDT CXL Fixed
Memory Window  Structures (CFMWSs) is definitely the right thing to do is
if the BIOS has also programmed the full set of decoders etc. That is something
we could do in QEMU as an option. Only if we do that would it be valid
to provide SRAT Memory structures for the CXL memory. I'd suggest that's
probably a job for EDK2 rather than QEMU but that's an implementation detail
and there is a dance between EDK2 and QEMU for creating some of the tables
anyway. This configuration reflects the pre hotplug / early CXL deployment
situation. Now we have proper support in Linux we have moved beyond that.
We do need to solve the dynamic NUMA node cases though and I'm hoping your
current work will make that a bit easier.

Note that I give the same advice to our firmware folk I talk to. This stuff
is policy - it belongs in OS control, not in a bunch of config menus in the
BIOS or output of some unknown heuristic. BIOS authors are not clairvoyant.
They have no way to know (in a non trivial topology) what makes sense for a
given use case or what devices are going to be hotplugged later.
I'd increasingly expect shipping BIOSes to have a "hands off" option in which
they make not attempt to guess about what is beyond the host bridges.

One argument I have heard for why a BIOS could know an appropriate CFMWS to
SRAT memory structure mapping is the CFWMS / QTG (Quality of Service)
mapping implying a consistency of performance expectations in a given CFMWS.
However that's very specific to particular designs. For others PA space is
expensive thus they use one large CFMWS for everything and QoS handing in
the uarch relies on information derived from the host bridge decoders. Often
no one cares about cross host bridge interleave for same reason full system
DRAM interleave is a niche thing. PA space is too expensive to provide the
extra CFMWS to support it.

If we are looking at forwards looking systems, that are built to work with full
gamut of CXL then all that should be in SRAT for CXL topology is the Generic
Port Structures (provide a handle for perf data in HMAT to the edge
of the 'known world' - the host bridge / root ports). Nothing else.

If we do have BIOSes that are guessing what to put in SRAT and associated
HMAT etc then there is a fairly strong argument that a good OS
implementation should at most take such structures as a hint not a
rule (though obviously we don't do that today).

> 
> A system providing a CEDT/CFMWS entry without an SRAT entry is arguably
> bad BIOS.

I'd argue that if you aren't programming the decoder topology (and probably
locking everything down) and are providing SRAT then you are providing
a guess at best and that's a bad BIOS - not the other way around.

Note we've supported this from the first in Linux so it's not like there
is anything missing, just a corner case to tidy up.

> 
> But yeah, this is not urgent.

I'd like to see it fixed, but given we don't know of a system where
this applies today it doesn't need to be super rushed!

Jonathan

> 
> ~Gregory


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ