[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240214175735.GG1088888@nvidia.com>
Date: Wed, 14 Feb 2024 13:57:35 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Andy Gospodarek <andrew.gospodarek@...adcom.com>
Cc: Christoph Hellwig <hch@...radead.org>,
Saeed Mahameed <saeed@...nel.org>, Arnd Bergmann <arnd@...db.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Leon Romanovsky <leonro@...dia.com>, Jiri Pirko <jiri@...dia.com>,
Leonid Bloch <lbloch@...dia.com>, Itay Avraham <itayavr@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>,
David Ahern <dsahern@...nel.org>,
Aron Silverton <aron.silverton@...cle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V4 0/5] mlx5 ConnectX control misc driver
On Wed, Feb 14, 2024 at 11:17:58AM -0500, Andy Gospodarek wrote:
> 1. How someone working at a distro would be able to help/understand if
> a tool like this was run and may have programmed their hardware
> differently than a default driver or FW.
This is stepping a bit further ahead of the debug focused interfaces
mlx5ctl currently has and into configuration..
Obviously a generic FW RPC can do configuration too.
We do have alot of industry experience with configuration already.
Realistically every complex device already has on-device FLASH that
has on-device FW configuration. Formalizing what already exists and is
widely used isn't going to make the world worse.
I'm also very certain there are actual problems with FW configuration
incompatibility. eg there are PCI related configurables on mlx5 you
can set that will break everything if you are not The Special User
that the configuration was created for.
Today everyone already deals with FW version specific behavioral
differences and bugs.
FW Vers + FW Vers&Config is also the current state of affairs, and is
delt with about the same. IMHO
> due to out-of-band configuration. One thought I had was some sort of
> journal to note that config happened from outside, but I'm not sure
> there is much value there.
I think devices using this kind of approach need to be well behaved -
like no wild (production) access to random bits of HW. Things need to
be structured and reportable.
There is a clear split in my mind between:
- inspection debugging
- invasive mutating debugging
- configuration
And maybe "invasive mutating debugging" taints the kernel or something
like that.
> With the ability to dump regs with devlink health it's possible to
> know that values may have changed, so I'm not concerned about this
> since that infra exists.
Distros need good tooling to report the state of HW in distro problem
reports. If something is lacking we should improve on it.
TBH I'd pick "report current status" over "journaling" because devices
have FLASH and configuration changes can typically be permanent across
reboots.
Also, I would generally frown on designs changing OS visible behavior
without running the device through a FLR.
> 2. If one can make configuration changes to hardware without kernel
> APIs (devlink et al), will people still develop new kernel APIs?
Sure? Why not? Every vendor has existing tooling to do this
configuration stuff. mlx5's existing tooling runs in userspace and
uses /sys/../resource. *Everyone* has been using this for the last 15
years. So whatever impact it has is long since baked in.
> I think the answer to this is 'yes' as realistically using default tools
> is much better than using vendor tools for regular configuration.
Personally I think the answer is more nuanced. Users have problems
they want to solve. If your problem is "provision my HW from the
factory" you may be perfectlly happy with a shared userspace program
that can do that for all the HW vendors at the site. No artificial
kernel enforced commonality required.
IMHO as kernel people we often look at the hammer we have and fall
into these patterns where "only the kernel can do abstractions!" but
it isn't true, we can very effectively make abstractions in userspace
too.
> if vendors provide shortcuts to program hardware for eval/testing/debug
> my experience is that these are not acceptable long-term. Requests are always
> made to include this type of changes in future releases. So I'm not too
> concerned about the ossification of kernel APIs due to this being included.
Nor am I. Users will ask for the things that work best for them and
vendors have been historically good at responding.
I also look at RDMA, which is deeply down this path of not doing
common interfaces in the kernel. We have alot of robust open source
now! The common interfaces that the userspace community developed are
wildly different and much more useful than what the prior effort to
build kernel common interfaces was creating. In fact there are now
competing ideas on what those interfaces should be and alot of
interesting development.
The kernel common APIs that were developed before turned out to be
such a straight jacket that it held back so much stuff and forced
people into out-of-tree and badly harmed community forming in the
userspace side.
In other words, allowing userspace to have freedom has really pushed
things forward in good ways.
> So if there is general agreement that this is acceptable (especially
> compared to other out-of-tree drivers, I think a few who find this
> useful should sync on the best way forward; I'm not sure a separate
> driver for each vendor is the right approach.
I also like this, I don't want the outcome of this discussion to be
that only mlx5ctl gets merged. I want all the HW that has this problem
to have support in the mainline kernel.
I want to see a robust multi-vendor userspace community that houses
alot of stuff and is working to solve user problems. To me this point
is the big win for the community.
If we need a formally named kernel subsystem to pull that community
together then lets do that. We can probably have some common ioctls
and shared uABI marshaling/discovery/etc like nvme does. At the end
it can't really be abstracted too much, the user facing API is really
going to be "send a FW RPC" like mlx5/nvme-vendor does.
Honestly it will be easier to discuss and document the overall design
and what devices have to do to be compliant within a tidy subsytem
with a name that people can identify as a thing. Naming things and
having spaces is really important to build community around them.
Certainly, I am up for this. If you have a similar usage flows I'm
quite happy to work with everyone to launch a new mini-subystem. If
other people are reading this and think they want to take part too
please speak up.
> If upstream (and therefore distros) are going to accept this we probably
> owe it to them to not have misc drivers for every different flavor of
> hardware out there when it might be possible to add a generic driver
> that can connect to a PCI device via new (auxiliary bus?) API.
Distros often prefer if they have less packaging, versioning and
community to deal with :)
I'm inspired by things like nvme-cli that show a nice way forward
where there can be vendor plugins and a shared github with a common
shared umbrella of CLI, documentation and packaging.
With some co-operation from the distros we can push vendors to
participate in the shared userspace, and we can push vendors from the
kernel by denying kernel driver support without basic integration to
the userspace (like DRM does with mesa).
Thanks,
Jason
Powered by blists - more mailing lists