netdev - Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6887b72724173_11968100cb@dwillia2-mobl4.notmuch>
Date: Mon, 28 Jul 2025 10:45:11 -0700
From: <dan.j.williams@...el.com>
To: Alejandro Lucero Palau <alucerop@....com>, Dave Jiang
	<dave.jiang@...el.com>, <alejandro.lucero-palau@....com>,
	<linux-cxl@...r.kernel.org>, <netdev@...r.kernel.org>,
	<dan.j.williams@...el.com>, <edward.cree@....com>, <davem@...emloft.net>,
	<kuba@...nel.org>, <pabeni@...hat.com>, <edumazet@...gle.com>
Subject: Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral

Alejandro Lucero Palau wrote:
[..]
> > Can you please explain how the accelerator driver init path is
> > different in this instance that it requires cxl_mem driver to defer
> > probing? Currently with a type3, the cxl_acpi driver will setup the
> > CXL root, hostbridges and PCI root ports. At that point the memdev
> > driver will enumerate the rest of the ports and attempt to establish
> > the hierarchy. However if cxl_acpi is not done, the mem probe will
> > fail. But, the cxl_acpi probe will trigger a re-probe sequence at
> > the end when it is done. At that point, the mem probe should
> > discover all the necessary ports if things are correct. If the
> > accelerator init path is different, can we introduce some
> > documentation to explain the difference?

The biggest difference is that devm_cxl_add_memdev() is "hopeful" in the
cxl_pci case. I.e. cxl_pci_probe() does not fail is the memory device it
registered does not ever pass cxl_mem_probe().

Accelerators are different. They want to know that the CXL side of the
house is up and running before enabling driver features that depend on
it. They also want to safely teardown driver functionality if CXL
capabilities disappear.

cxl_pci does not know or care if or when cxl_mem::probe() succeeds and
cxl_mem::remove() is invoked.

> > Also, it seems as long as port topology is not found, it will always
> > go to deferred probing. At what point do we conclude that things may
> > be missing/broken and we need to fail?

Right, at some point the driver needs to give up on CXL ever arriving.

> Hi Dave,
> 
> 
> The patch commit comes from Dan's original one, so I'm afraid I can not 
> explain it better myself.
> 
> 
> I added this patch again after Dan suggesting with cxl_acquire_endpoint 
> the initialization by a Type2 can obtain some protection against cxl_mem 
> or cxl_acpi being removed. I added later protection or handling against 
> this by the sfc driver after initialization. So this is the main reason 
> for this patch at least to me.
> 
> 
> Regarding the goal from the original patch, being honest, I can not see 
> the cxl_acpi problem, although I'm not saying it does not exist. But it 
> is quite confusing to me and as I said in another patch regarding probe 
> deferral, supporting that option would add complexity to the current sfc 
> driver probing. If there exists another workaround for avoiding it, that 
> would be the way I prefer to follow.

The problem is how to handle the "CXL device in PCIe-only mode" problem.
Even with a CXL endpoint directly attached to a CXL host there is no
guarantee that the device trains the link in CXL mode. So in addition to
the software-dynamic problems of module loading and asynchronous driver
bind/unbind, there is this hardware-dynamic problem.

I am losing my nerve with the cxl_acquire_endpoint() approach. Now that
I see how this driver tried to use it and the questions it generated, it
pushes too much complexity to leaf drivers. In the end, I want to
(inspired by faux_device) get to the point where the caller can assume
that successful devm_cxl_add_memdev() means that CXL is operational and
any non-interleaved CXL regions have finished auto-assembly/creation.

To get there this needs Terry's patches that set pdev->is_cxl on all
ancestor devices in order to make a determination that the hardware-CXL
link is up before going to flush software CXL-link establishment.

> Adding documentation about all this would definitely help, even without 
> the Type2 case.

I would ask that you help Terry get the protocol error handling series
in shape as part of the dependency here is to make sure that there is a
capable error model for CXL link events.

Meanwhile, I am going to rework devm_cxl_add_memdev() to make it report
when CXL port arrival is deferred, permanently failed, or successful.