[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <662aae2fe4887_a96f294bf@dwillia2-mobl3.amr.corp.intel.com.notmuch>
Date: Thu, 25 Apr 2024 12:25:35 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>, Dan Williams
<dan.j.williams@...el.com>
CC: <linux-cxl@...r.kernel.org>, Sreenivas Bagalkote
<sreenivas.bagalkote@...adcom.com>, Brett Henning
<brett.henning@...adcom.com>, Harold Johnson <harold.johnson@...adcom.com>,
Sumanesh Samanta <sumanesh.samanta@...adcom.com>,
<linux-kernel@...r.kernel.org>, Davidlohr Bueso <dave@...olabs.net>, "Dave
Jiang" <dave.jiang@...el.com>, Alison Schofield <alison.schofield@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>, Ira Weiny <ira.weiny@...el.com>,
<linuxarm@...wei.com>, <linux-api@...r.kernel.org>, Lorenzo Pieralisi
<lpieralisi@...nel.org>, "Natu, Mahesh" <mahesh.natu@...el.com>,
<gregkh@...uxfoundation.org>
Subject: Re: RFC: Restricting userspace interfaces for CXL fabric management
Jonathan Cameron wrote:
> Dan Williams <dan.j.williams@...el.com> wrote:
> > The minimum requirement for justifying an in kernel driver is that
> > something else in the kernel consumes that facility. So, again, I want
> > to get back to specifics what else in the kernel is going to leverage
> > the Switch CCI mailbox?
>
> Why? I've never heard of such as a requirement and numerous drivers
> provide fairly direct access to hardware. Sometimes there is a subsystem
> aiding the data formatting etc, but fundamentally that's a convenience.
>
> Taking this to a silly level, on this basis all networking drivers would
> not be in the kernel. They are there mainly to provide userspace access to
> a network.
Networking is an odd choice to bring into this discussion because that
subsystem has a long history of wrestling with the "kernel bypass"
concern. It has largely been able to weather the storm of calls to get
out of the way and let vendor drivers have free reign.
The AF_XDP socket family was the result of finding a path to let
userspace networking stacks build functionality without forfeiting the
relevance and ongoing collaboration on the in-kernel stack.
> Any of the hardware access subsystems such hwmon, input, IIO
> etc are primarily about providing a convenient way to get data to/from
> a device. They are kernel drivers because that is the cleaner path
> for data marshaling, interrupt handling etc.
Those are drivers supporting a subsystem to bring a sane kernel
interface to front potenitally multiple vendor implementations of
similar functionality.
They are not asking for kernel bypass facilities that defeat the purpose
of ever talking to the kernel community again for potentially
system-integrity violating functionality behind disparate vendor
interfaces.
> In kernel users are a perfectly valid reason to have a kernel driver,
> but it's far from the only one. None of the AI accelerators have in kernel
> users today (maybe they will in future). Sure there are other arguments
> that mean only a few such devices have been upstreamed, but it's not
> that they need in kernel users. If it's really an issue I'll just submit
> it to driver/misc and Greg can take a view on whether it's an acceptable
> device to have driver for... (after he's asked the obvious question of
> why aren't the CXL folk taking it!) +cc Greg to save providing info later.
AI accelerators are heavy consumers of the core-mm you can not
reasonably coordinate with the core-mm from userspace.
If the proposal is to build a new CXL Fabric Management subsystem with
proper ABIs and openly defined command sets that will sit behind thought
out kernel interfaces then I can get on board with that.
Where I am stuck currently is the assertion that step 1 is "build ioctl
passthrough tunnels with 'do anything you want and get away with it'
semantics".
Recall that the current restriction for raw commands was to encourage
vendor collaboration and building sane kernel interfaces, and that
distros would enable it in their "debug" kernels to enable hardware
validation test benches. If the assertion is "that's too restrictive,
enable a vendor ecosystem based on kernel bypass" that goes too far.
> For background this is a PCI function with a mailbox used for switch
> configuration. The mailbox is identical to the one found on CXL type3
> devices. Whole thing defined in the CXL spec. It gets a little complex
> because you can tunnel commands to devices connected to the switch,
> potentially affecting other hosts. Typical Linux device doing this
> would be a BMC, but there have been repeated questions about providing
> a subset of access to any Linux system (avoiding the foot guns)
> Whole thing fully discoverable - proposal is a standard PCI driver.
>
> > The generic-Type-3-device mailbox has an in kernel driver because the
> > kernel has need to send mailbox commands internally and it is
> > fundamental to RAS and provisioning flows that the kernel have this
> > coordination. What are the motivations for an in-band Switch CCI command
> > submission path?
> >
> > It could be the case that you have a self-evident example in mind that I
> > have thus far failed to realize.
> >
>
> There are possibilities, but for now it's a transport driver just like
> MCTP etc with a well defined chardev interface, with documented ioctl
> interface etc (which I'd keep inline with one the CXL mailbox uses
> just to avoid reinventing the wheel - I'd prefer to use that directly
> to avoid divergence but I don't care that much).
>
> As far as I can see, with the security / blast radius concern alleviated
> by disabling this if lockdown is in use + taint for unaudited commands
> (and a nasty sounding config similar to the cxl mailbox one)
> there is little reason not to take such a driver into the kernel.
> It has next to no maintenance impact outside of itself and a bit of
> library code which I've proposed pushing down to the level of MMPT
> (so PCI not CXL) if you think that is necessary.
>
> We want interrupt handling and basic access controls / command
> interface to userspace.
>
> Apologies if I'm grumpy - several long days of battling cpu hotplug code.
Again, can we please get back to the specifics of the commands to be
enabled here? I am open to CXL Fabric Management as a first class
citizen, I am not currently open to CXL Fabric Management gets to live
in the corner of the kernel that is unreviewable because all it does is
take opaque ioctl blobs and marshal them to hardware.
Powered by blists - more mailing lists