[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab182638-3693-422b-a7b6-b3630a35a0be@amd.com>
Date: Thu, 27 Nov 2025 09:08:43 +0000
From: Alejandro Lucero Palau <alucerop@....com>
To: PJ Waskiewicz <ppwaskie@...nel.org>, alejandro.lucero-palau@....com,
linux-cxl@...r.kernel.org, netdev@...r.kernel.org, dan.j.williams@...el.com,
edward.cree@....com, davem@...emloft.net, kuba@...nel.org,
pabeni@...hat.com, edumazet@...gle.com, dave.jiang@...el.com
Cc: Martin Habets <habetsm.xilinx@...il.com>,
Edward Cree <ecree.xilinx@...il.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Ben Cheatham <benjamin.cheatham@....com>
Subject: Re: [PATCH v21 15/23] sfc: get endpoint decoder
On 11/26/25 18:35, PJ Waskiewicz wrote:
> Hi Alejandro,
>
> On Wed, 2025-11-26 at 09:09 +0000, Alejandro Lucero Palau wrote:
>> On 11/26/25 01:27, PJ Waskiewicz wrote:
>>> Hi Alejandro,
>>>
>>> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@....com
>>> wrote:
>>>> From: Alejandro Lucero <alucerop@....com>
>>>>
>>>> Use cxl api for getting DPA (Device Physical Address) to use
>>>> through
>>>> an
>>>> endpoint decoder.
>>>>
>>>> Signed-off-by: Alejandro Lucero <alucerop@....com>
>>>> Reviewed-by: Martin Habets <habetsm.xilinx@...il.com>
>>>> Acked-by: Edward Cree <ecree.xilinx@...il.com>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@....com>
>>>> Reviewed-by: Dave Jiang <dave.jiang@...el.com>
>>>> ---
>>>> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
>>>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
>>>> b/drivers/net/ethernet/sfc/efx_cxl.c
>>>> index d7c34c978434..1a50bb2c0913 100644
>>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>>>> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data
>>>> *probe_data)
>>>> return -ENOSPC;
>>>> }
>>>>
>>>> + cxl->cxled = cxl_request_dpa(cxl->cxlmd,
>>>> CXL_PARTMODE_RAM,
>>>> + EFX_CTPIO_BUFFER_SIZE);
>>> I've been really struggling to get this flow working in my
>>> environment.
>>> The above function call has a call-chain like this:
>>>
>>> - cxl_request_dpa()
>>> => cxl_dpa_alloc()
>>> => __cxl_dpa_alloc()
>>> => __cxl_dpa_reserve()
>>> => __request_region()
>>>
>>> That last call to __request_region() is not handling a Type2 device
>>> that has its mem region defined as EFI Special Purpose memory.
>>> Basically if the underlying hardware has the memory region marked
>>> that
>>> way, it's still getting mapped into the host's physical address
>>> map,
>>> but it's explicitly telling the OS to bugger off and not try to map
>>> it
>>> as system RAM, which is what we want. Since this is being used as
>>> an
>>> acceleration path, we don't want the OS to muck about with it.
>>>
>>> The issue here is now that I have to build CXL into the kernel
>>> itself
>>> to get around the circular dependency issue with depmod, I see this
>>> when my kernel boots and the device trains, but *before* my driver
>>> loads:
>>>
>>> # cat /proc/iomem
>>> [...snip...]
>>> c050000000-c08fffffff : CXL Window 0
>>> c050000000-c08fffffff : Soft Reserved
>>>
>>> That right there is my device. And it's being presented correctly
>>> that
>>> it's reserved so the OS doesn't mess with it. However, that call
>>> to
>>> __request_region() fails with -EBUSY since it can't take ownership
>>> of
>>> that region since it's already owned by the core.
>>>
>>> I can't just skip over this flow for DPA init, so I'm at a bit of a
>>> loss how to proceed. How is your device presenting the .mem region
>>> to
>>> the host?
>>
>> Hi PJ,
>>
>>
>> My work is based on the device not using EFI_CONVENTIONAL_MEMORY +
>> EFI_MEMORY_SP but EFI_RESERVED_TYPE. In the first case the kernel can
>> try to use that memory and the BIOS goes through default
>> initialization,
> I'm not sure I follow. Your device is based on using
> EFI_RESERVED_TYPE? Or is it based on the former? My device is based
> on EFI_RESERVED_TYPE, which translates into the Soft Reserved status as
> a result of BIOS enumeration and the CXL core enumerating that memory
> resource.
My device will be based on EFI_RESERVED_TYPE but it has no that special
flag with the devices I got for testing, so the BIOS passes it to the
kernel in the HMAT as EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP.
Not sure what you mean with that CXL core enumeration. With a type2,
supposedly not a PCI MEMORY CLASS, the CXL PCI driver can not attach to
it. AFAIK, that memory region can only be used, or what I think is
problem you have, reserved/allocated in terms of resources, by DAX/HMEM.
However, I thought with EFI_RESERVED_TYPE that could not happen at all.
If so, I would say it is fully wrong. If not, what is the meaning of
this EFI memory type from the kernel point of view?
>> the latter will avoid BIOS or kernel to mess with such a memory.
>> Because
>> there is no BIOS yet supporting this I had to remove DAX support from
>> the kernel and deal (for testing) with some BIOS initialization we
>> will
>> not have in production.
> Can you elaborate what you mean here? Do you mean the proposed patches
> here are trying to work around this BIOS limitation?
Apologies for not being clearer here. The proposed patches discussed
here are the right ones for our device ... once we got all the pieces
together, chiefly our UEFI driver advertising that special flag and the
(AMD) BIOS supporting it. If you got such a BIOS from AMD, lucky you!
>
> I'm not sure I understand what BIOS limitations you mean though. I see
> on both an AMD and Intel host (CXL 2.0-capable) the same behavior that
> I'd expect of EFI_RESERVED_TYPE getting set aside so the OS doesn't
> mess with it. This is on CRB-level stuff plus production-level
> platforms.
From your previous emails, the systems seems to detect the memory as
Soft Reserved ... which implies DAX can use it.
Not sure if you confirmed the flag being used by the kernel from the
HMAT table, but worth to double check if not.
>>
>> For your case I thought this work
>> https://lore.kernel.org/linux-cxl/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com/T/#me2bc0d25a2129993e68df444aae073addf886751
>>
>> was solving your problem but after looking at it now, I think that
>> will
>> only be useful for Type3 and the hotplug case. Maybe it is time to
>> add
>> Type2 handling there. I'll study that patchset with more detail and
>> comment for solving your case.
> I just looked through that, and I might be able to cherry-pick some
> stuff. I'll do the same offline and see if I can come up with a
> workable solution to get past this wall for now.
>
> That said though, I don't really want or care about DAX. I can already
> find and map the underlying CXL.mem accelerated region through other
> means (RCRB, DVSEC, etc.).
>
> What I'm trying to do is get the regionX object to instantiate on my
> CXL.mem memory block, so that I can remove the region, ultimately
> tearing down the decoders, and allowing me to hotplug the device. The
> patches here seem to still assume a Type3-ish device where there's DPA
> needing to get mapped into HPA, which our devices are already allocated
> in the decoders due to the EFI_RESERVED_TYPE enumeration. But the
> patches aren't seeing that firmware already set them up, since the
> decoders haven't been committed yet.
If you mean the HDM decoders are configured by the BIOS and the CXL Host
Bridge is also with also the right configuration for redirecting to the
CXL Root Port your device is attached to, the (AMD) BIOS is doing so
without the EFI_RESERVED_TYPE as well. So apart from that potential
conflict with DAX/HMEM, which I'm not sure it is happening, you could be
facing here the problem of the current patches not supporting a Type2
device with already committed HDM, but you are saying yours not having
it ... Annyways, Benjamin Cheatham pointed out this other problem which
I was also aware of due to my testing, but as I said when he brought it,
I would prefer to support that as a follow-up work as the client behind
this initial (and basic) Type2 support, the sfc driver, not requiring it.
>
> My root decoder has 1GB of space, which is the size of my endpoint
> device's memory size (1GB). There is no DPA to map, and the HPA
> already appears "full" since the device is already configured in the
> decoder.
This makes me to think it is weird your device HDM not committed. BTW,
is the Root decoder CFMWS size the same in Intel and AMD systems? I bet
it is not from discussing this with Dan and cia, but curious to know in
your case.
>
> TL;DR: if your device you're testing with presents the CXL.mem region
> as EFI_RESERVED_TYPE, I don't see how these patches are working.
> Unless you're doing something extra outside of the patches, which isn't
> obvious to me.
Yes, sorry, that is the case. I'm applying some dirty changes to these
patches for testing with my current testing devices, including the BIOS
and the Host.
>>
>> FWIW, last year in Vienna I raised the concern of the kernel doing
>> exactly what you are witnessing, and I proposed having a way for
>> taking
>> the device/memory from DAX but I was told unanimously that was not
>> necessary and if the BIOS did the wrong thing, not fixing that in the
>> kernel. In hindsight I would say this conflict was not well
>> understood
>> then (me included) with all the details, so maybe it is time to have
>> this capacity, maybe from user space or maybe specific kernel param
>> triggering the device passing from DAX.
> I do recall this. Unfortunately I brought up similar concerns way back
> in Dublin in 2021 regarding all of this flow well before 2.0-capable
> hosts arrived. I think I started asking the questions way too early,
> since this was of little to no concern at the time (nor was Type2
> device support).
Maybe we can make the case now. I'll seize LPC to discuss this further.
Will you be there?
Thank you
>
> Cheers,
> -PJ
Powered by blists - more mailing lists