[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240404193817.500523aa@kernel.org>
Date: Thu, 4 Apr 2024 19:38:17 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: John Fastabend <john.fastabend@...il.com>, Jiri Pirko
<jiri@...nulli.us>, netdev@...r.kernel.org, bhelgaas@...gle.com,
linux-pci@...r.kernel.org, Alexander Duyck <alexanderduyck@...com>,
davem@...emloft.net, pabeni@...hat.com
Subject: Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta
Platforms Host Network Interface
On Thu, 4 Apr 2024 17:11:47 -0700 Alexander Duyck wrote:
> > Opensourcing is just one push to github.
> > There are guarantees we give to upstream drivers.
>
> Are there? Do we have them documented somewhere?
I think they are somewhere in Documentation/
To some extent this question in itself supports my point that written
down rules, as out of date as they may be, seem to carry more respect
than what a maintainer says :S
> > > Eventually they need some kernel changes and than we block those too
> > > because we didn't allow the driver that was the use case? This seems
> > > wrong to me.
> >
> > The flip side of the argument is, what if we allow some device we don't
> > have access to to make changes to the core for its benefit. Owner
> > reports that some changes broke the kernel for them. Kernel rules,
> > regression, we have to revert. This is not a hypothetical, "less than
> > cooperative users" demanding reverts, and "reporting us to Linus"
> > is a reality :(
> >
> > Technical solution? Maybe if it's not a public device regression rules
> > don't apply? Seems fairly reasonable.
>
> This is a hypothetical. This driver currently isn't changing anything
> outside of itself. At this point the driver would only be build tested
> by everyone else. They could just not include it in their Kconfig and
> then out-of-sight, out-of-mind.
Not changing does not mean not depending on existing behavior.
Investigating and fixing properly even the hardest regressions in
the stack is a bar that Meta can so easily clear. I don't understand
why you are arguing.
> > > Anyways we have zero ways to enforce such a policy. Have vendors
> > > ship a NIC to somebody with the v0 of the patch set? Attach a picture?
> >
> > GenAI world, pictures mean nothing :) We do have a CI in netdev, which
> > is all ready to ingest external results, and a (currently tiny amount?)
> > of test for NICs. Prove that you care about the device by running the
> > upstream tests and reporting results? Seems fairly reasonable.
>
> That seems like an opportunity to be exploited through. Are the
> results going to be verified in any way? Maybe cryptographically
> signed? Seems like it would be easy enough to fake the results.
I think it's much easier to just run the tests than write a system
which will competently lie. But even if we completely suspend trust,
someone lying is of no cost to the community in this case.
> > > Even if vendor X claims they will have a product in N months and
> > > than only sells it to qualified customers what to do we do then.
> > > Driver author could even believe the hardware will be available
> > > when they post the driver, but business may change out of hands
> > > of the developer.
> > >
> > > I'm 100% on letting this through assuming Alex is on top of feedback
> > > and the code is good.
> >
> > I'd strongly prefer if we detach our trust and respect for Alex
> > from whatever precedent we make here. I can't stress this enough.
> > IDK if I'm exaggerating or it's hard to appreciate the challenges
> > of maintainership without living it, but I really don't like being
> > accused of playing favorites or big companies buying their way in :(
>
> Again, I would say we look at the blast radius. That is how we should
> be measuring any change. At this point the driver is self contained
> into /drivers/net/ethernet/meta/fbnic/. It isn't exporting anything
> outside that directory, and it can be switched off via Kconfig.
It is not practical to ponder every change case by case. Maintainers
are overworked. How long until we send the uAPI patch for RSS on the
flow label? I'd rather not re-litigate this every time someone posts
a slightly different feature. Let's cover the obvious points from
the beginning while everyone is paying attention. We can amend later
as need be.
> When the time comes to start adding new features we can probably start
> by looking at how to add either generic offloads like was done for
> GSO, CSO, ect or how it can also be implemented on another vendor's
> NIC.
>
> At this point the only risk the driver presents is that it is yet
> another driver, done in the same style I did the other Intel drivers,
> and so any kernel API changes will end up needing to be applied to it
> just like the other drivers.
The risk is we'll have a fight every time there is a disagreement about
the expectations.
> > > I think any other policy would be very ugly to enforce, prove, and
> > > even understand. Obviously code and architecture debates I'm all for.
> > > Ensuring we have a trusted, experienced person signed up to review
> > > code, address feedback, fix whatever syzbot finds and so on is also a
> > > must I think. I'm sure Alex will take care of it.
> >
> > "Whatever syzbot finds" may be slightly moot for a private device ;)
> > but otherwise 100%! These are exactly the kind of points I think we
> > should enumerate. I started writing a list of expectations a while back:
> >
> > Documentation/maintainer/feature-and-driver-maintainers.rst
> >
> > I think we just need something like this, maybe just a step up, for
> > non-public devices..
>
> I honestly think we are getting the cart ahead of the horse. When we
> start talking about kernel API changes then we can probably get into
> the whole "private" versus "publicly available" argument. A good
> example of the kind of thing I am thinking of is GSO partial where I
> ended up with Mellanox and Intel sending me 40G and 100G NICs and
> cables to implement it on their devices as all I had was essentially
> igb and ixgbe based NICs.
That'd be great. Maybe even more than I'd expect. So why not write
it down? In case the person doing the coding is not Alex Duyck, and
just wants to get it done for their narrow use case, get a promo,
go work on something else?
> Odds are when we start getting to those kind of things maybe we need
> to look at having a few systems available for developer use, but until
> then I am not sure it makes sense to focus on if the device is
> publicly available or not.
Developer access would be huge.
A mirage of developer access? immaterial :)
Powered by blists - more mailing lists