lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <672c13b19a530_10bc6294bd@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 6 Nov 2024 17:11:13 -0800
From: Dan Williams <dan.j.williams@...el.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>, Huaisheng Ye
	<huaisheng.ye@...el.com>, "linux-cxl@...r.kernel.org"
	<linux-cxl@...r.kernel.org>, "dan.j.williams@...el.com"
	<dan.j.williams@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"pei.p.jia@...el.com" <pei.p.jia@...el.com>
Subject: Re: [PATCH] [RFC] cxl/region: Fix region creation for greater than
 x2 switches

[ add Dave so you 

Zhijian Li (Fujitsu) wrote:
> 
> 
> On 27/10/2024 15:57, Huaisheng Ye wrote:
> > The cxl_port_setup_targets() algorithm fails to identify valid target list
> > ordering in the presence of 4-way and above switches resulting in
> > 'cxl create-region' failures of the form:
> > 
> >    # cxl create-region -d decoder0.0 -g 1024 -s 2G -t ram -w 8 -m mem4 mem1 mem6 mem3 mem2 mem5 mem7 mem0
> >    cxl region: create_region: region0: failed to set target7 to mem0
> >    cxl region: cmd_create_region: created 0 regions
> > 
> >    [kernel debug message]
> >    check_last_peer:1213: cxl region0: pci0000:0c:port1: cannot host mem6:decoder7.0 at 2
> >    bus_remove_device:574: bus: 'cxl': remove device region0
> > 
> > QEMU can create this failing topology:
> > 
> >                         ACPI0017:00 [root0]
> >                             |
> >                           HB_0 [port1]
> >                          /             \
> >                       RP_0             RP_1
> >                        |                 |
> >                  USP [port2]           USP [port3]
> >              /    /    \    \        /   /    \    \
> >            DSP   DSP   DSP   DSP   DSP  DSP   DSP  DSP
> >             |     |     |     |     |    |     |    |
> >            mem4  mem6  mem2  mem7  mem1 mem3  mem5  mem0
> >   Pos:      0     2     4     6     1    3     5    7
> 
> Yeah, I tried this topology long long ago, it didn't work. At that time, I thought it
> might be just like that. Until recently that I saw this [1] in section
> 2.13.15.1 Region Spanning 2 HB Root Ports Example Configuration Checks
> 
> I once tried to understand why the code used "distance" to determine the order of the target,
> but in the end, I still couldn't figure it out (and I still don't understand it now).
> IIRC, neither the CXL spec nor this document mentioned the keyword "distance" at all.

Oh, that means this needs a comment or a better variable name.

In this patch discussion [1] Jim came up with the term "ancestral_ways"
to describe the same concept of what is the offset ("distance") between
consecutive indices in the target list.

[1]: http://lore.kernel.org/ZUHeTLZb+od8q4EE@ubuntu

Does "ancestral_ways" more clearly convey the math that is being
performed at each level of the hierarchy?

Now, "ancestral_ways" also does not show up in the CXL specification,
but that is because the CXL specification leaves at as an exercise for
software to figure out an algorithm to validate that a proposed ordering
of memory-device-decoders in a region can be supported by the given CXL
topology.

In the meantime I have flagged this patch to Dave for consideration in
the next cxl-fixes pull request, but I suspect it will need to be
updated with a comment and/or rename to resovle the "distance"
confusion.

https://patchwork.kernel.org/bundle/cxllinux/cxl-fixes/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ