[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zc4Pa4QWGQegN4mI@nanopsycho>
Date: Thu, 15 Feb 2024 14:19:40 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Jakub Kicinski <kuba@...nel.org>
Cc: William Tu <witu@...dia.com>, Jacob Keller <jacob.e.keller@...el.com>,
bodong@...dia.com, jiri@...dia.com, netdev@...r.kernel.org,
saeedm@...dia.com,
"aleksander.lobakin@...el.com" <aleksander.lobakin@...el.com>
Subject: Re: [RFC PATCH v3 net-next] Documentation: devlink: Add devlink-sd
Fri, Feb 09, 2024 at 02:26:33AM CET, kuba@...nel.org wrote:
>On Fri, 2 Feb 2024 08:46:56 +0100 Jiri Pirko wrote:
>> Fri, Feb 02, 2024 at 05:00:41AM CET, kuba@...nel.org wrote:
>> >On Thu, 1 Feb 2024 11:13:57 +0100 Jiri Pirko wrote:
>> >> Wait a sec.
>> >
>> >No, you wait a sec ;) Why do you think this belongs to devlink?
>> >Two months ago you were complaining bitterly when people were
>> >considering using devlink rate to control per-queue shapers.
>> >And now it's fine to add queues as a concept to devlink?
>>
>> Do you have a better suggestion how to model common pool object for
>> multiple netdevices? This is the reason why devlink was introduced to
>> provide a platform for common/shared things for a device that contains
>> multiple netdevs/ports/whatever. But I may be missing something here,
>> for sure.
>
>devlink just seems like the lowest common denominator, but the moment
>we start talking about multi-PF devices it also gets wobbly :(
You mean you see real to have a multi-PF device that allows to share the
pools between the PFs? If, in theory, that exists, could this just be a
limitation perhaps?
>I think it's better to focus on the object, without scoping it to some
>ancestor which may not be sufficient tomorrow (meaning its own family
>or a new object in netdev like page pool).
Ok.
>
>> >> With this API, user can configure sharing of the descriptors.
>> >> So there would be a pool (or multiple pools) of descriptors and the
>> >> descriptors could be used by many queues/representors.
>> >>
>> >> So in the example above, for 1k representors you have only 1k
>> >> descriptors.
>> >>
>> >> The infra allows great flexibility in terms of configuring multiple
>> >> pools of different sizes and assigning queues from representors to
>> >> different pools. So you can have multiple "classes" of representors.
>> >> For example the ones you expect heavy trafic could have a separate pool,
>> >> the rest can share another pool together, etc.
>> >
>> >Well, it does not extend naturally to the design described in that blog
>> >post. There I only care about a netdev level pool, but every queue can
>> >bind multiple pools.
>> >
>> >It also does not cater naturally to a very interesting application
>> >of such tech to lightweight container interfaces, macvlan-offload style.
>> >As I said at the beginning, why is the pool a devlink thing if the only
>> >objects that connect to it are netdevs?
>>
>> Okay. Let's model it differently, no problem. I find devlink device
>> as a good fit for object to contain shared things like pools.
>> But perhaps there could be something else. Something new?
>
>We need something new for more advanced memory providers, anyway.
>The huge page example I posted a year ago needs something to get
>a huge page from CMA and slice it up for the page pools to draw from.
>That's very similar, also not really bound to a netdev. I don't think
>the cross-netdev aspect is the most important aspect of this problem.
Well, in our case, the shared entity is not floating, it is bound to a
device related to netdev.
>
>> >Another netdev thing where this will be awkward is page pool
>> >integration. It lives in netdev genl, are we going to add devlink pool
>> >reference to indicate which pool a pp is feeding?
>>
>> Page pool is per-netdev, isn't it? It could be extended to be bound per
>> devlink-pool as you suggest. It is a bit awkward, I agree.
>>
>> So instead of devlink, should be add the descriptor-pool object into
>> netdev genl and make possible for multiple netdevs to use it there?
>> I would still miss the namespace of the pool, as it naturally aligns
>> with devlink device. IDK :/
>
>Maybe the first thing to iron out is the life cycle. Right now we
>throw all configuration requests at the driver which ends really badly
>for those of us who deal with heterogeneous environments. Applications
>which try to do advanced stuff like pinning and XDP break because of
>all the behavior differences between drivers. So I don't think we
>should expose configuration of unstable objects (those which user
>doesn't create explicitly - queues, irqs, page pools etc) to the driver.
>The driver should get or read the config from the core when the object
>is created.
I see. But again, for global objects, I understand. But this is
device-specific object and configuration. How do you tie it up together?
>
>This gets back to the proposed descriptor pool because there's a
>chicken and an egg problem between creating the representors and
>creating the descriptor pool, right? Either:
> - create reprs first with individual queues, reconfigure them to bind
> them to a pool
> - create pool first bind the reprs which don't exist to them,
> assuming the driver somehow maintains the mapping, pretty weird
> to configure objects which don't exist
> - create pool first, add an extra knob elsewhere (*cough* "shared-descs
> enable") which produces somewhat loosely defined reasonable behavior
>
>Because this is a general problem (again, any queue config needs it)
>I think we'll need to create some sort of a rule engine in netdev :(
>Instead of configuring a page pool you'd add a configuration rule
>which can match on netdev and queue id and gives any related page pool
>some parameters. NAPI is another example of something user can't
>reasonably configure directly. And if we create such a rule engine
>it should probably be shared...
Powered by blists - more mailing lists