[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220503171413.00001297@Huawei.com>
Date: Tue, 3 May 2022 17:14:13 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: "Weiny, Ira" <ira.weiny@...el.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Ben Widawsky <ben.widawsky@...el.com>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
<linux-cxl@...r.kernel.org>, Linux PCI <linux-pci@...r.kernel.org>
Subject: Re: [PATCH V8 04/10] cxl/pci: Create auxiliary devices for each DOE
mailbox
On Fri, 29 Apr 2022 10:01:09 -0700
Dan Williams <dan.j.williams@...el.com> wrote:
> On Fri, Apr 29, 2022 at 9:39 AM Jonathan Cameron
> <Jonathan.Cameron@...wei.com> wrote:
> >
> > On Thu, 28 Apr 2022 14:09:38 -0700
> > ira.weiny@...el.com wrote:
> >
> > > On Wed, Apr 27, 2022 at 06:19:42PM +0100, Jonathan Cameron wrote:
> > > > On Thu, 14 Apr 2022 13:32:31 -0700
> > > > ira.weiny@...el.com wrote:
> > > >
> > > > > From: Ira Weiny <ira.weiny@...el.com>
> > > > >
> > > > > CXL kernel drivers optionally need to access DOE mailbox capabilities.
> > > > > Access to mailboxes for things such as CDAT, SPDM, and IDE are needed by
> > > > > the kernel while other access is designed towards user space usage. An
> > > > > example of this is for CXL Compliance Testing (see CXL 2.0 14.16.4
> > > > > Compliance Mode DOE) which offers a mechanism to set different test
> > > > > modes for a device.
> > > > >
> > > > > There is no anticipated need for the kernel to share an individual
> > > > > mailbox with user space. Thus developing an interface to marshal access
> > > > > between the kernel and user space for a single mailbox is unnecessary
> > > > > overhead. However, having the kernel relinquish some mailboxes to be
> > > > > controlled by user space is a reasonable compromise to share access to
> > > > > the device.
> > > > >
> > > > > The auxiliary bus provides an elegant solution for this. Each DOE
> > > > > capability is given its own auxiliary device. This device is controlled
> > > > > by a kernel driver by default which restricts access to the mailbox.
> > > > > Unbinding the driver from a single auxiliary device (DOE mailbox
> > > > > capability) frees the mailbox for user space access. This architecture
> > > > > also allows a clear picture on which mailboxes are kernel controlled vs
> > > > > not.
> > > > >
> > > > > Iterate each DOE mailbox capability and create auxiliary bus devices.
> > > > > Follow on patches will define a driver for the newly created devices.
> > > > >
> > > > > sysfs shows the devices.
> > > > >
> > > > > $ ls -l /sys/bus/auxiliary/devices/
> > > > > total 0
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.0 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.0
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.1 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.1
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.2 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.2
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.3 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.3
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.4 -> ../../../devices/pci0000:35/0000:35:00.0/0000:36:00.0/cxl_pci.doe.4
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.5 -> ../../../devices/pci0000:bf/0000:bf:00.0/0000:c0:00.0/cxl_pci.doe.5
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.6 -> ../../../devices/pci0000:35/0000:35:01.0/0000:37:00.0/cxl_pci.doe.6
> > > > > lrwxrwxrwx 1 root root 0 Mar 24 10:47 cxl_pci.doe.7 -> ../../../devices/pci0000:bf/0000:bf:01.0/0000:c1:00.0/cxl_pci.doe.7
> > > > >
> > > > > Signed-off-by: Ira Weiny <ira.weiny@...el.com>
> > > >
> > > > I'm not 100% happy with effectively having one solution for CXL
> > > > and probably a different one for DOEs on switch ports
> > > > (which I just hacked into a port service driver to convince
> > > > myself there was at least one plausible way of doing that) but if
> > > > this effectively separates the two discussions then I guess I can
> > > > live with it for now ;)
> > >
> > > I took some time this morning to mull this over and talk to Dan...
> > >
> > > :-(
> > >
> > > Truthfully the aux driver does very little except provide a way for admins to
> > > trigger the driver to stop/start accessing the Mailbox.
> > >
> > > I suppose a simple sysfs interface could be done to do the same?
> > >
> > > I'll let Dan weigh in here.
> >
> > I wonder if best short term option is to not provide a means of
> > removing it at all (separate from the PCI driver that is).
> > Then we can take our time to decide on the interface if we ever
> > get much demand for one.
> >
> > >
> > > >
> > > > Once this is merged we can start the discussion about how to
> > > > handle switch ports with DOEs both for CDAT and SPDM.
> > >
> > > I'm ok with that too. However, I was thinking that this was not a user ABI.
> > > But it really is. If user space starts writing script to unbind drivers and
> > > then we drop the aux driver support it will break them...
> > >
> > > >
> > > > I'll send out an RFC that is so hideous it will get people to
> > > > suggestion how to do it better!
> > >
> > > I think I'd like to see that.
> >
> > Fair enough. It may muddy the waters a bit :( I'll send an RFC
> > next week. I've not looked at how the CXL region code etc would
> > actually get to the latency / bandwidth info from the driver yet
> > it just goes as far as reading a CDAT length. I also want to actually
> > hook up some decent switch CDAT emulation in the QEMU code
> > (right now it's giving the same default table as for a type 3 device).
> >
> > I just hope we don't bikeshed around the RFC in a fashion that slows
> > this series moving forwards.
>
> I think we have time in the sense that the worst that happens is that
> tooling picks the wrong CFMWS to dynamically create a region and the
> performance ends up being sub-optimal. That's tolerable to work around
> in userspace in the near term. I want to get some wider confidence in
> the DOE ABI with respect to the known protocols and what to do about
> the vendor-specific protocols that may conflict and will be driven
> from userspace issued config-cycles.
Hi Dan,
Ok, though I would like to try to push the conversation forwards beyond where
we got to last year. IIRC the general conclusion was that all protocols
sharing a DOE instance (and more generally because they all share
discovery) must be mediated by the kernel and that we would not be
supporting a generic access interface (there was one in a much earlier
version of this patch set). Whilst it's far from the only thing that needs
resolving, this DOE question is also a blocker on getting anywhere with
CMA/SPDM. Unbinding a driver as the means to stop the kernel accessing
a DOE is reliant on the host driver not deciding to do any more protocol
discovery - we can probably rely on that but it's not pretty.
> That likely means that no DOE ABI
> is the best ABI to start which means not moving forward with
> aux-devices so scripts do not become attached to something that is not
> fully committed to being carried forward.
This isn't really DOE ABI we are currently discussing, it's a CXL subsystem ABI.
If we decided to only expose the DOE as an internal implementation
detail (calls made directly from appropriate existing CXL driver) then
there wouldn't be an ABI question. We would be limiting DOE access
for other protocols but personally I don't see that as a problem
in the short to medium term.
Things may be more 'interesting' for the PCIe port services driver though
(see RFC just sent out)
https://lore.kernel.org/linux-cxl/20220503153449.4088-1-Jonathan.Cameron@huawei.com/T/#t
Perhaps if we can resolve what that should look like, the CXL EP side
of things will be much easier to figure out?
>
> I still want to refresh the request_config_region() support for at
> least having the kernel warn on userspace conflicting configuration
> writes to config areas claimed by a driver.
Great, that seems like a sensible step to do in parallel.
Jonathan
Powered by blists - more mailing lists