[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240902174944.293dfe4b@kernel.org>
Date: Mon, 2 Sep 2024 17:49:44 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Joe Damato <jdamato@...tly.com>
Cc: netdev@...r.kernel.org, edumazet@...gle.com, amritha.nambiar@...el.com,
sridhar.samudrala@...el.com, sdf@...ichev.me, bjorn@...osinc.com,
hch@...radead.org, willy@...radead.org, willemdebruijn.kernel@...il.com,
skhawaja@...gle.com, Martin Karsten <mkarsten@...terloo.ca>, Donald Hunter
<donald.hunter@...il.com>, "David S. Miller" <davem@...emloft.net>, Paolo
Abeni <pabeni@...hat.com>, Jesper Dangaard Brouer <hawk@...nel.org>, Xuan
Zhuo <xuanzhuo@...ux.alibaba.com>, Daniel Jurgens <danielj@...dia.com>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 5/5] netdev-genl: Support setting per-NAPI
config values
On Sat, 31 Aug 2024 18:27:45 +0100 Joe Damato wrote:
> > How do you feel about making this configuration opt-in / require driver
> > changes? What I'm thinking is that having the new "netif_napi_add()"
> > variant (or perhaps extending netif_napi_set_irq()) to take an extra
> > "index" parameter would make the whole thing much simpler.
>
> I think if we are going to go this way, then opt-in is probably the
> way to go. This series would include the necessary changes for mlx5,
> in that case (because that's what I have access to) so that the new
> variant has a user?
SG! FWIW for bnxt the "id" is struct bnxt_napi::index (I haven't looked
at bnxt before writing the suggestion :))
> > Index would basically be an integer 0..n, where n is the number of
> > IRQs configured for the driver. The index of a NAPI instance would
> > likely match the queue ID of the queue the NAPI serves.
> >
> > We can then allocate an array of "napi_configs" in net_device -
> > like we allocate queues, the array size would be max(num_rx_queue,
> > num_tx_queues). We just need to store a couple of ints so it will
> > be tiny compared to queue structs, anyway.
>
> I assume napi_storage exists for both combined RX/TX NAPIs (i.e.
> drivers that multiplex RX/TX on a single NAPI like mlx5) as well
> as drivers which create NAPIs that are RX or TX-only, right?
Hm.
> If so, it seems like we'd either need to:
> - Do something more complicated when computing how much NAPI
> storage to make, or
> - Provide a different path for drivers which don't multiplex and
> create some number of (for example) TX-only NAPIs ?
>
> I guess I'm just imagining a weird case where a driver has 8 RX
> queues but 64 TX queues. max of that is 64, so we'd be missing 8
> napi_storage ?
>
> Sorry, I'm probably just missing something about the implementation
> details you summarized above.
I wouldn't worry about it. We can added a variant of alloc_netdev_mqs()
later which takes the NAPI count explicitly. For now we can simply
assume max(rx, tx) is good enough, and maybe add a WARN_ON_ONCE() to
the set function to catch drivers which need something more complicated.
Modern NICs have far more queues than IRQs (~NAPIs).
> > The NAPI_SET netlink op can then work based on NAPI index rather
> > than the ephemeral NAPI ID. It can apply the config to all live
> > NAPI instances with that index (of which there really should only
> > be one, unless driver is mid-reconfiguration somehow but even that
> > won't cause issues, we can give multiple instances the same settings)
> > and also store the user config in the array in net_device.
>
> I understand what you are proposing. I suppose napi-get could be
> extended to include the NAPI index, too?
Yup!
> Then users could map queues to NAPI indexes to queues (via NAPI ID)?
Yes.
> > When new NAPI instance is associate with a NAPI index it should get
> > all the config associated with that index applied.
> >
> > Thoughts? Does that makes sense, and if so do you think it's an
> > over-complication?
>
> It feels a bit tricky, to me, as it seems there are some edge cases
> to be careful with (queue count change). I could probably give the
> implementation a try and see where I end up.
>
> Having these settings per-NAPI would be really useful and being able
> to support IRQ suspension would be useful, too.
>
> I think being thoughtful about how we get there is important; I'm a
> little wary of getting side tracked, but I trust your judgement and
> if you think this is worth exploring I'll think on it some more.
I understand, we can abandon it if the implementation drags out due to
various nit picks and back-and-forths. But I don't expect much of that
🤞️
Powered by blists - more mailing lists