[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bee0d31e-bd3e-b96a-dd98-7b7bf5b087dc@nvidia.com>
Date: Mon, 14 Dec 2020 22:30:17 +0200
From: Maxim Mikityanskiy <maximmi@...dia.com>
To: Cong Wang <xiyou.wangcong@...il.com>
CC: Maxim Mikityanskiy <maximmi@...lanox.com>,
"David S. Miller" <davem@...emloft.net>,
Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>,
Saeed Mahameed <saeedm@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
Tariq Toukan <tariqt@...lanox.com>,
Dan Carpenter <dan.carpenter@...cle.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...dia.com>,
Yossi Kuperman <yossiku@...dia.com>
Subject: Re: [PATCH net-next v2 2/4] sch_htb: Hierarchical QoS hardware
offload
On 2020-12-14 21:35, Cong Wang wrote:
> On Mon, Dec 14, 2020 at 7:13 AM Maxim Mikityanskiy <maximmi@...dia.com> wrote:
>>
>> On 2020-12-11 21:16, Cong Wang wrote:
>>> On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy <maximmi@...lanox.com> wrote:
>>>>
>>>> HTB doesn't scale well because of contention on a single lock, and it
>>>> also consumes CPU. This patch adds support for offloading HTB to
>>>> hardware that supports hierarchical rate limiting.
>>>>
>>>> This solution addresses two main problems of scaling HTB:
>>>>
>>>> 1. Contention by flow classification. Currently the filters are attached
>>>> to the HTB instance as follows:
>>>
>>> I do not think this is the reason, tcf_classify() has been called with RCU
>>> only on the ingress side for a rather long time. What contentions are you
>>> talking about here?
>>
>> When one attaches filters to HTB, tcf_classify is called from
>> htb_classify, which is called from htb_enqueue, which is called with the
>> root spinlock of the qdisc taken.
>
> So it has nothing to do with tcf_classify() itself... :-/
>
> [...]
>
>>> And doesn't TBF already work with mq? I mean you can attach it as
>>> a leaf to each mq so that the tree lock will not be shared either, but you'd
>>> lose the benefits of a global rate limit too.
>>
>> Yes, I'd lose not only the global rate limit, but also multi-level
>> hierarchical limits, which are all provided by this HTB offload - that's
>> why TBF is not really a replacement for this feature.
>
> Interesting, please explain how your HTB offload still has a global rate
> limit and borrowing across queues?
Sure, I will explain that.
> I simply can't see it, all I can see
> is you offload HTB into each queue in ->attach(),
In the non-offload mode, the same HTB instance would be attached to all
queues. In the offload mode, HTB behaves like MQ: there is a root
instance of HTB, but each queue gets a separate simple qdisc (pfifo).
Only the root qdisc (HTB) gets offloaded, and when that happens, the NIC
creates an object for the QoS root.
Then all configuration changes are sent to the driver, and it issues the
corresponding firmware commands to replicate the whole hierarchy in the
NIC. Leaf classes correspond to queue groups (in this implementation
queue groups contain only one queue, but it can be extended), and inner
classes correspond to entities called TSARs.
The information about rate limits is stored inside TSARs and queue
groups. Queues know what groups they belong to, and groups and TSARs
know what TSAR is their parent. A queue is picked in ndo_select_queue by
looking at the classification result of clsact. So, when a packet is put
onto a queue, the NIC can track the whole hierarchy and do the HTB
algorithm.
> where I assume the
> hardware will do rate limit on each queue,
So, it's not flat in the NIC, and rate limiting is done in a
hierarchical way.
> if the hardware also has a
> global control, why it is not reflected on the root qdisc?
I'm not sure if I got this last question correctly. The root qdisc is
HTB, and all the configuration of the HTB tree gets reflected in the
NIC, as I just explained. I hope now it's clearer, but if you still have
questions, I'm glad to explain more details (also, I'm ready to respin
with the minor fixes for the CI build issue on parisc).
Thanks,
Max
> Thanks!
>
Powered by blists - more mailing lists