[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhTG+2BmoN76kaEmWC=J0BBvnCc7fUhAwjbSX5xzSvtGXw@mail.gmail.com>
Date: Tue, 3 Sep 2024 12:04:52 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Joe Damato <jdamato@...tly.com>, netdev@...r.kernel.org, edumazet@...gle.com,
amritha.nambiar@...el.com, sridhar.samudrala@...el.com, sdf@...ichev.me,
bjorn@...osinc.com, hch@...radead.org, willy@...radead.org,
willemdebruijn.kernel@...il.com, Martin Karsten <mkarsten@...terloo.ca>,
Donald Hunter <donald.hunter@...il.com>, "David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>, Jesper Dangaard Brouer <hawk@...nel.org>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, Daniel Jurgens <danielj@...dia.com>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 5/5] netdev-genl: Support setting per-NAPI config values
This is great Joe.
On Mon, Sep 2, 2024 at 6:02 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Mon, 2 Sep 2024 18:56:07 +0200 Joe Damato wrote:
> > > How do you feel about making this configuration opt-in / require driver
> > > changes? What I'm thinking is that having the new "netif_napi_add()"
> > > variant (or perhaps extending netif_napi_set_irq()) to take an extra
> > > "index" parameter would make the whole thing much simpler.
> >
> > What about extending netif_queue_set_napi instead? That function
> > takes a napi and a queue index.
>
> This may become problematic, since there may be more queues than NAPIs.
> Either with multiple combined queues mapped to a single NAPI, or having
> separate Rx/Tx NAPIs.
>
> No strong preference on which function we modify (netif_napi_add vs the
> IRQ setting helper) but I think we need to take the index as an
> explicit param, rather than try to guess it based on another ID.
> The queue ID will match 95% of the time, today. But Intel was
> interested in "remapping" queues. And we all think about a queue API...
> So using queue ID is going to cause problems down the road.
Do we need a queue to napi association to set/persist napi
configurations? Can a new index param be added to the netif_napi_add
and persist the configurations in napi_storage. I guess the problem
would be the size of napi_storage.
Also wondering if for some use case persistence would be problematic
when the napis are recreated, since the new napi instances might not
represent the same context? For example If I resize the dev from 16
rx/tx to 8 rx/tx queues and the napi index that was used by TX queue,
now polls RX queue.
>
> > Locally I kinda of hacked up something simple that:
> > - Allocates napi_storage in net_device in alloc_netdev_mqs
> > - Modifies netif_queue_set_napi to:
> > if (napi)
> > napi->storage = dev->napi_storage[queue_index];
> >
> > I think I'm still missing the bit about the
> > max(rx_queues,tx_queues), though :(
>
> You mean whether it's enough to do
>
> napi_cnt = max(txqs, rxqs)
>
> ? Or some other question?
>
> AFAIU mlx5 can only have as many NAPIs as it has IRQs (256?).
> Look at ip -d link to see how many queues it has.
> We could do txqs + rxqs to be safe, if you prefer, but it will
> waste a bit of memory.
>
> > > Index would basically be an integer 0..n, where n is the number of
> > > IRQs configured for the driver. The index of a NAPI instance would
> > > likely match the queue ID of the queue the NAPI serves.
> >
> > Hmmm. I'm hesitant about the "number of IRQs" part. What if there
> > are NAPIs for which no IRQ is allocated ~someday~ ?
>
> A device NAPI without an IRQ? Whoever does something of such silliness
> will have to deal with consequences. I think it's unlikely.
>
> > It seems like (I could totally be wrong) that netif_queue_set_napi
> > can be called and work and create the association even without an
> > IRQ allocated.
> >
> > I guess the issue is mostly the queue index question above: combined
> > rx/tx vs drivers having different numbers of rx and tx queues.
>
> Yes, and in the future the ability to allocate more queues than NAPIs.
> netif_queue_set_napi() was expected to cover such cases.
>
> > > We can then allocate an array of "napi_configs" in net_device -
> > > like we allocate queues, the array size would be max(num_rx_queue,
> > > num_tx_queues). We just need to store a couple of ints so it will
> > > be tiny compared to queue structs, anyway.
> > >
> > > The NAPI_SET netlink op can then work based on NAPI index rather
> > > than the ephemeral NAPI ID. It can apply the config to all live
> > > NAPI instances with that index (of which there really should only
> > > be one, unless driver is mid-reconfiguration somehow but even that
> > > won't cause issues, we can give multiple instances the same settings)
> > > and also store the user config in the array in net_device.
> > >
> > > When new NAPI instance is associate with a NAPI index it should get
> > > all the config associated with that index applied.
> > >
> > > Thoughts? Does that makes sense, and if so do you think it's an
> > > over-complication?
> >
> > I think what you are proposing seems fine; I'm just working out the
> > implementation details and making sure I understand before sending
> > another revision.
>
> If you get stuck feel free to send a link to a GH or post RFC.
> I'm adding extra asks so whether form of review helps.. :)
Powered by blists - more mailing lists