lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f4b1039-b1be-b8a4-2659-a2b848120f67@nvidia.com>
Date:   Mon, 14 Dec 2020 17:12:50 +0200
From:   Maxim Mikityanskiy <maximmi@...dia.com>
To:     Cong Wang <xiyou.wangcong@...il.com>,
        Maxim Mikityanskiy <maximmi@...lanox.com>
CC:     "David S. Miller" <davem@...emloft.net>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Saeed Mahameed <saeedm@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        "Linux Kernel Network Developers" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...dia.com>,
        Yossi Kuperman <yossiku@...dia.com>
Subject: Re: [PATCH net-next v2 2/4] sch_htb: Hierarchical QoS hardware
 offload

On 2020-12-11 21:16, Cong Wang wrote:
> On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy <maximmi@...lanox.com> wrote:
>>
>> HTB doesn't scale well because of contention on a single lock, and it
>> also consumes CPU. This patch adds support for offloading HTB to
>> hardware that supports hierarchical rate limiting.
>>
>> This solution addresses two main problems of scaling HTB:
>>
>> 1. Contention by flow classification. Currently the filters are attached
>> to the HTB instance as follows:
> 
> I do not think this is the reason, tcf_classify() has been called with RCU
> only on the ingress side for a rather long time. What contentions are you
> talking about here?

When one attaches filters to HTB, tcf_classify is called from 
htb_classify, which is called from htb_enqueue, which is called with the 
root spinlock of the qdisc taken.

>>
>>      # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
>>      classid 1:10
>>
>> It's possible to move classification to clsact egress hook, which is
>> thread-safe and lock-free:
>>
>>      # tc filter add dev eth0 egress protocol ip flower dst_port 80
>>      action skbedit priority 1:10
>>
>> This way classification still happens in software, but the lock
>> contention is eliminated, and it happens before selecting the TX queue,
>> allowing the driver to translate the class to the corresponding hardware
>> queue.
> 
> Sure, you can use clsact with HTB, or any combinations you like, but you
> can't assume your HTB only works with clsact, can you?

The goal is to eliminate the root lock from the datapath, and the 
traditional filters attached to the HTB itself are handled under that 
lock. I believe it's a sane limitation, given that the offloaded mode is 
a new mode of operation, it's opt-in, and it may also have additional 
hardware-imposed limitations.

> 
>>
>> Note that this is already compatible with non-offloaded HTB and doesn't
>> require changes to the kernel nor iproute2.
>>
>> 2. Contention by handling packets. HTB is not multi-queue, it attaches
>> to a whole net device, and handling of all packets takes the same lock.
>> When HTB is offloaded, its algorithm is done in hardware. HTB registers
>> itself as a multi-queue qdisc, similarly to mq: HTB is attached to the
>> netdev, and each queue has its own qdisc. The control flow is still done
>> by HTB: it calls the driver via ndo_setup_tc to replicate the hierarchy
>> of classes in the NIC. Leaf classes are presented by hardware queues.
>> The data path works as follows: a packet is classified by clsact, the
>> driver selects a hardware queue according to its class, and the packet
>> is enqueued into this queue's qdisc.
> 
> I do _not_ read your code, from what you describe here, it sounds like
> you just want a per-queue rate limit, instead of a global one. So why
> bothering HTB whose goal is a global rate limit?

I would disagree. HTB's goal is hierarchical rate limits with borrowing. 
Sure, it can be used just to set a global limit, but it's main purpose 
is creating a hierarchy of classes.

And yes, we are talking about the whole netdevice here, not about 
per-queue limits (we already support per-queue rate limits with the 
means of tx_maxrate, so we wouldn't need any new code for that). The 
tree of classes is global for the whole netdevice, with hierarchy and 
borrowing supported. These additional send queues can be considered as 
hardware objects that represent offloaded leaf traffic classes (which 
can be extended to multiple queues per class).

So, we are really offloading HTB functionality here, not just using HTB 
interface for something else (something simpler). I hope it sounds 
better for you now.

> And doesn't TBF already work with mq? I mean you can attach it as
> a leaf to each mq so that the tree lock will not be shared either, but you'd
> lose the benefits of a global rate limit too.

Yes, I'd lose not only the global rate limit, but also multi-level 
hierarchical limits, which are all provided by this HTB offload - that's 
why TBF is not really a replacement for this feature.

> EDT does basically the same,
> but it never claims to completely replace HTB. ;)
> 
> Thanks.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ