lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6892325deccdb_55f09100fb@dwillia2-xfh.jf.intel.com.notmuch>
Date: Tue, 5 Aug 2025 09:33:33 -0700
From: <dan.j.williams@...el.com>
To: <alejandro.lucero-palau@....com>, <linux-cxl@...r.kernel.org>,
	<netdev@...r.kernel.org>, <dan.j.williams@...el.com>, <edward.cree@....com>,
	<davem@...emloft.net>, <kuba@...nel.org>, <pabeni@...hat.com>,
	<edumazet@...gle.com>, <dave.jiang@...el.com>
CC: Alejandro Lucero <alucerop@....com>, Jonathan Cameron
	<Jonathan.Cameron@...wei.com>
Subject: Re: [PATCH v17 18/22] cxl: Allow region creation by type2 drivers

alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@....com>
> 
> Creating a CXL region requires userspace intervention through the cxl
> sysfs files. Type2 support should allow accelerator drivers to create
> such cxl region from kernel code.
> 
> Adding that functionality and integrating it with current support for
> memory expanders.
> 
> Support an action by the type2 driver to be linked to the created region
> for unwinding the resources allocated properly.

The hardest part of CXL is the fact that typical straight-line driver
expectations like "device present == MMIO available" are violated. An
accelerator driver needs to worry about asynchronous region detach and
CXL port detach.

Ideally any event that takes down a CXL port or the region simply
results in the accelerator driver being detached to clean everything up.

The difficult part about that is that the remove path for regions and
CXL ports hold locks that prevent the accelerator remove path from
running.

I do not think it is maintainable for every accelerator driver to invent
its own cleanup scheme like this. The expectation should be that a
region can go into a defunct state if someone triggers removal actions
in the wrong order, but otherwise the accelerator driver should be able
to rely on a detach event to clean everything up.

So opting into CXL operation puts a driver into a situation where it can
be unbound whenever the CXL link goes down logically or physically.
Physical device removal of a CXL port expects that the operator has
first shutdown all driver operations, and if they have not at least the
driver should not crash while awaiting the remove event.

Physical CXL port removal is the "easy" case since that will naturally
result in the accelerator 'struct pci_dev' being removed. The more
difficult cases are the logical removal / shutdown of a CXL port or
region. Those should schedule accelerator detach and put the region into
an error state until that cleanup runs.

So, in summary, do not allow for custom region callbacks, arrange for
accelerator detach and just solve the "fail in-flight operations while
awaiting detach" problem.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ