[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3d4bc1f2-1763-426f-a881-911be1d5128e@amd.com>
Date: Mon, 22 Sep 2025 16:10:25 -0500
From: "Cheatham, Benjamin" <benjamin.cheatham@....com>
To: <alejandro.lucero-palau@....com>
CC: <linux-cxl@...r.kernel.org>, <netdev@...r.kernel.org>,
<dan.j.williams@...el.com>, <edward.cree@....com>, <davem@...emloft.net>,
<kuba@...nel.org>, <pabeni@...hat.com>, <edumazet@...gle.com>,
<dave.jiang@...el.com>
Subject: Re: [PATCH v18 00/20] Type2 device basic support
On 9/18/2025 4:17 AM, alejandro.lucero-palau@....com wrote:
> From: Alejandro Lucero <alucerop@....com>
>
> First of all, the patchset should be applied on the described base
> commit then applying Terry's v11 about CXL error handling plus last four
> pathces from Dan's for-6.18/cxl-probe-order branch.
>
> Secondly, this is another try being aware it will not be the last since
> there are main aspects to agree on. The most important thing is to decide
> how to solve the problem of type2 stability under CXL core events. Let me
> start then defining that problem listing the events or situations pointed
> out but, I think, not clearly defined and therefore creating confusion, at
> least to me.
>
> We have different situations to be aware of:
>
>
> 1) CXL topology not there or nor properly configured yet.
>
> 2) accelerator relying on pcie instead of CXL.io
>
> 3) potential removal of cxl_mem, cxl_acpi or cxl_port
>
> 4) cxl initialization failing due to dynamic modules dependencies
>
> 5) CXL errors
>
>
> Dan's patches from the cxl-probe-order branch will hopefully fix the last
> situation. I have tested this and it works as expected: type2 driver
> initialization can not start because module dependencies. This solves
> #4.
>
> Using Terry's patchset, specifically pcie_is_cxl function, solves #2.
>
> Regarding #5, I think Terry's patchset introduces the proper handling for
> this, or at least some initial work which will surely require adjustments,
> and of course Type2 using it, which is not covered in v18 and I prefer
> to add in a followup work.
>
> About #3, the only way to be protected is partially during initialization
> with cxl_acquire (patch 8), and afer initialization with a callback to the
> driver when cxl objects are removed (callback given when creating cxl
> region at patch 16, used by sfc driver in patch 18). Initially, cxl_acquire
> was implemented with other goal (next point) but as it can give
> protection during initialization, I kept it. About the callback, Dan
> does not like it, and Jonathan not keen of it. I think we agreed the
> right solution is those modules should not be allowed to be removed if
> there are dependencies, and it requires work in the cxl core for
> support that as a follow-up work. Therefore, or someone gives another
> idea about how to handle this now, temporarily, without that proper
> solution, or I should delay this full patchset until that is done.
>
> Then we have #1 which I admit is the most confusing (at least to me).
> If we can not solve the problem of the proper initialization based on the
> probe() calls for those cxl devices to work with, then an explanation
> about this case is needed and, I guess, to document it.
>
> AFAIK, the BIOS will perform a good bunch of CXL initialization (BTW, I
> think we should discuss this as well at some point for having same
> expectations about what and how things are done, and also when) then the
> kernel CXL initialization will perform its own bunch based on what the
> BIOS is given.
I would assume that anything that is addressed in Documentation/driver-api/cxl/ is fair
game for assumptions. I only read the original docs when Gregory (?) posted them on the list,
but it does cover some BIOS expectation IIRC.
As for the other stuff, I don't think I understand the problems well enough to comment :/.
Thanks,
Ben
Powered by blists - more mailing lists