[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ce70b9f-23dc-03c9-f83a-4b620cdc8a7d@intel.com>
Date: Mon, 19 Sep 2022 15:12:27 +0200
From: "Wilczynski, Michal" <michal.wilczynski@...el.com>
To: Edward Cree <ecree.xilinx@...il.com>, <netdev@...r.kernel.org>
CC: <alexandr.lobakin@...el.com>, <dchumak@...dia.com>,
<maximmi@...dia.com>, <jiri@...nulli.us>,
<simon.horman@...igine.com>, <jacob.e.keller@...el.com>,
<jesse.brandeburg@...el.com>, <przemyslaw.kitszel@...el.com>
Subject: Re: [RFC PATCH net-next v4 2/6] devlink: Extend devlink-rate api with
queues and new parameters
On 9/15/2022 11:01 PM, Edward Cree wrote:
> On 15/09/2022 19:41, Wilczynski, Michal wrote:
>> Hi,
>> Previously we discussed adding queues to devlink-rate in this thread:
>> https://lore.kernel.org/netdev/20220704114513.2958937-1-michal.wilczynski@intel.com/T/#u
>> In our use case we are trying to find a way to expose hardware Tx scheduler tree that is defined
>> per port to user. Obviously if the tree is defined per physical port, all the scheduling nodes will reside
>> on the same tree.
>>
>> Our customer is trying to send different types of traffic that require different QoS levels on the same
>> VM, but on a different queues. This requires completely different rate setups for that queue - in the
>> implementation that you're mentioning we wouldn't be able to arbitrarily reassign the queue to any node.
> I'm not sure I 100% understand what you're describing, but I get the
> impression it's maybe a layering violation — the hypervisor should only
> be responsible for shaping the VM's overall traffic, it should be up to
> the VM to decide how to distribute that bandwidth between traffic types.
Maybe a switchdev case would be a good parallel here. When you enable
switchdev, you get port representors on
the host for each VF that is already attached to the VM. Something that
gives the host power to configure
netdev that it doesn't 'own'. So it seems to me like giving user more
power to configure things from the host
is acceptable.
> But if it's what your customer needs then presumably there's some reason
> for it that I'm not seeing. I'm not a QoS expert by any means — I just
> get antsy that every time I look at devlink it's gotten bigger and keeps
> escaping further out of the "device-wide configuration" concept it was
> originally sold as :(
I understand the concern, and sympathize with the desire to keep things
small, but this is the least
evil method I've found, that would enable the customer to achieve
optimal configuration. I've experimented
with tc-htb in the previous thread, but there are multiple problems with
that approach - I tried to describe them
there.
In my mind this is a device-wide configuration, since the ice driver
registers each port as a separate pci device.
And each of this devices have their own hardware Tx Scheduler tree
global to that port. Queues that we're
discussing are actually hardware queues, and are identified by hardware
assigned txq_id.
The use-case is basically enabling user to fully utilize hardware
Hierarchical QoS accounting for every queue
in the system. The current kernel interfaces doesn't allow us to do so,
so we figured that least amount of duplication
would be to teach devlink about queues, and let user configure the
desired tree using devlink-rate.
>
>> Those queues would still need to share a single parent - their netdev. This wouldn't allow us to fully take
>> advantage of the HQoS and would introduce arbitrary limitations.
> Oh, so you need a hierarchy within which the VF's queues don't form a
> clade (subtree)? That sounds like something worth calling out in the
> commit message as the reason why you've designed it this way.
This is one of possible supported scenarios. Will include this in a
commit message, thanks for the tip.
>
>> Regarding the documentation, sure. I just wanted to get all the feedback from the mailing list and arrive at the final
>> solution before writing the docs.
> Fair. But you might get better feedback on the code if people have the
> docs to better understand the intent; just a suggestion.
Thanks for the advice :)
BR,
Michał
>
> -ed
Powered by blists - more mailing lists