lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f1e56067bdc46195a9e36f914aa103dc76d4f7f.camel@kernel.org>
Date: Wed, 26 Nov 2025 10:35:38 -0800
From: PJ Waskiewicz <ppwaskie@...nel.org>
To: Alejandro Lucero Palau <alucerop@....com>,
 alejandro.lucero-palau@....com, 	linux-cxl@...r.kernel.org,
 netdev@...r.kernel.org, dan.j.williams@...el.com, 	edward.cree@....com,
 davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	edumazet@...gle.com, dave.jiang@...el.com
Cc: Martin Habets <habetsm.xilinx@...il.com>, Edward Cree
	 <ecree.xilinx@...il.com>, Jonathan Cameron <Jonathan.Cameron@...wei.com>, 
 Ben Cheatham <benjamin.cheatham@....com>
Subject: Re: [PATCH v21 15/23] sfc: get endpoint decoder

Hi Alejandro,

On Wed, 2025-11-26 at 09:09 +0000, Alejandro Lucero Palau wrote:
> 
> On 11/26/25 01:27, PJ Waskiewicz wrote:
> > Hi Alejandro,
> > 
> > On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@....com
> > wrote:
> > > From: Alejandro Lucero <alucerop@....com>
> > > 
> > > Use cxl api for getting DPA (Device Physical Address) to use
> > > through
> > > an
> > > endpoint decoder.
> > > 
> > > Signed-off-by: Alejandro Lucero <alucerop@....com>
> > > Reviewed-by: Martin Habets <habetsm.xilinx@...il.com>
> > > Acked-by: Edward Cree <ecree.xilinx@...il.com>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
> > > Reviewed-by: Ben Cheatham <benjamin.cheatham@....com>
> > > Reviewed-by: Dave Jiang <dave.jiang@...el.com>
> > > ---
> > >   drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
> > >   1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
> > > b/drivers/net/ethernet/sfc/efx_cxl.c
> > > index d7c34c978434..1a50bb2c0913 100644
> > > --- a/drivers/net/ethernet/sfc/efx_cxl.c
> > > +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> > > @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data
> > > *probe_data)
> > >   		return -ENOSPC;
> > >   	}
> > >   
> > > +	cxl->cxled = cxl_request_dpa(cxl->cxlmd,
> > > CXL_PARTMODE_RAM,
> > > +				     EFX_CTPIO_BUFFER_SIZE);
> > I've been really struggling to get this flow working in my
> > environment.
> > The above function call has a call-chain like this:
> > 
> > - cxl_request_dpa()
> >    => cxl_dpa_alloc()
> >       => __cxl_dpa_alloc()
> >          => __cxl_dpa_reserve()
> >             => __request_region()
> > 
> > That last call to __request_region() is not handling a Type2 device
> > that has its mem region defined as EFI Special Purpose memory.
> > Basically if the underlying hardware has the memory region marked
> > that
> > way, it's still getting mapped into the host's physical address
> > map,
> > but it's explicitly telling the OS to bugger off and not try to map
> > it
> > as system RAM, which is what we want. Since this is being used as
> > an
> > acceleration path, we don't want the OS to muck about with it.
> > 
> > The issue here is now that I have to build CXL into the kernel
> > itself
> > to get around the circular dependency issue with depmod, I see this
> > when my kernel boots and the device trains, but *before* my driver
> > loads:
> > 
> > # cat /proc/iomem
> > [...snip...]
> > c050000000-c08fffffff : CXL Window 0
> >    c050000000-c08fffffff : Soft Reserved
> > 
> > That right there is my device.  And it's being presented correctly
> > that
> > it's reserved so the OS doesn't mess with it.  However, that call
> > to
> > __request_region() fails with -EBUSY since it can't take ownership
> > of
> > that region since it's already owned by the core.
> > 
> > I can't just skip over this flow for DPA init, so I'm at a bit of a
> > loss how to proceed.  How is your device presenting the .mem region
> > to
> > the host?
> 
> 
> Hi PJ,
> 
> 
> My work is based on the device not using EFI_CONVENTIONAL_MEMORY + 
> EFI_MEMORY_SP but EFI_RESERVED_TYPE. In the first case the kernel can
> try to use that memory and the BIOS goes through default
> initialization, 

I'm not sure I follow.  Your device is based on using
EFI_RESERVED_TYPE?  Or is it based on the former?  My device is based
on EFI_RESERVED_TYPE, which translates into the Soft Reserved status as
a result of BIOS enumeration and the CXL core enumerating that memory
resource.

> the latter will avoid BIOS or kernel to mess with such a memory.
> Because 
> there is no BIOS yet supporting this I had to remove DAX support from
> the kernel and deal (for testing) with some BIOS initialization we
> will 
> not have in production.

Can you elaborate what you mean here?  Do you mean the proposed patches
here are trying to work around this BIOS limitation?

I'm not sure I understand what BIOS limitations you mean though.  I see
on both an AMD and Intel host (CXL 2.0-capable) the same behavior that
I'd expect of EFI_RESERVED_TYPE getting set aside so the OS doesn't
mess with it.  This is on CRB-level stuff plus production-level
platforms.

> 
> 
> For your case I thought this work 
> https://lore.kernel.org/linux-cxl/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com/T/#me2bc0d25a2129993e68df444aae073addf886751
>  
> was solving your problem but after looking at it now, I think that
> will 
> only be useful for Type3 and the hotplug case. Maybe it is time to
> add 
> Type2 handling there. I'll study that patchset with more detail and 
> comment for solving your case.

I just looked through that, and I might be able to cherry-pick some
stuff.  I'll do the same offline and see if I can come up with a
workable solution to get past this wall for now.

That said though, I don't really want or care about DAX.  I can already
find and map the underlying CXL.mem accelerated region through other
means (RCRB, DVSEC, etc.).

What I'm trying to do is get the regionX object to instantiate on my
CXL.mem memory block, so that I can remove the region, ultimately
tearing down the decoders, and allowing me to hotplug the device.  The
patches here seem to still assume a Type3-ish device where there's DPA
needing to get mapped into HPA, which our devices are already allocated
in the decoders due to the EFI_RESERVED_TYPE enumeration.  But the
patches aren't seeing that firmware already set them up, since the
decoders haven't been committed yet.

My root decoder has 1GB of space, which is the size of my endpoint
device's memory size (1GB).  There is no DPA to map, and the HPA
already appears "full" since the device is already configured in the
decoder.

TL;DR: if your device you're testing with presents the CXL.mem region
as EFI_RESERVED_TYPE, I don't see how these patches are working. 
Unless you're doing something extra outside of the patches, which isn't
obvious to me.

> 
> 
> FWIW, last year in Vienna I raised the concern of the kernel doing 
> exactly what you are witnessing, and I proposed having a way for
> taking 
> the device/memory from DAX but I was told unanimously that was not 
> necessary and if the BIOS did the wrong thing, not fixing that in the
> kernel. In hindsight I would say this conflict was not well
> understood 
> then (me included) with all the details, so maybe it is time to have 
> this capacity, maybe from user space or maybe specific kernel param 
> triggering the device passing from DAX.

I do recall this.  Unfortunately I brought up similar concerns way back
in Dublin in 2021 regarding all of this flow well before 2.0-capable
hosts arrived.  I think I started asking the questions way too early,
since this was of little to no concern at the time (nor was Type2
device support).

Cheers,
-PJ

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ