[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpUSY2Oxy2umgM5-DwMg9Y9UXX-Gkf=O4StPJFVz-N7PzA@mail.gmail.com>
Date: Tue, 7 Jul 2020 23:44:38 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Maxim Mikityanskiy <maximmi@...lanox.com>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
Yossi Kuperman <yossiku@...lanox.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
John Fastabend <john.fastabend@...il.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Dave Taht <dave.taht@...il.com>,
Jiri Pirko <jiri@...lanox.com>,
Rony Efraim <ronye@...lanox.com>,
Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [RFC PATCH] sch_htb: Hierarchical QoS hardware offload
On Fri, Jun 26, 2020 at 3:46 AM Maxim Mikityanskiy <maximmi@...lanox.com> wrote:
>
> HTB doesn't scale well because of contention on a single lock, and it
> also consumes CPU. Mellanox hardware supports hierarchical rate limiting
> that can be leveraged by offloading the functionality of HTB.
True, essentially because it has to enforce a global rate limit with
link sharing.
There is a proposal of adding a new lockless shaping qdisc, which
you can find in netdev list.
>
> Our solution addresses two problems of HTB:
>
> 1. Contention by flow classification. Currently the filters are attached
> to the HTB instance as follows:
>
> # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
> classid 1:10
>
> It's possible to move classification to clsact egress hook, which is
> thread-safe and lock-free:
>
> # tc filter add dev eth0 egress protocol ip flower dst_port 80
> action skbedit priority 1:10
>
> This way classification still happens in software, but the lock
> contention is eliminated, and it happens before selecting the TX queue,
> allowing the driver to translate the class to the corresponding hardware
> queue.
>
> Note that this is already compatible with non-offloaded HTB and doesn't
> require changes to the kernel nor iproute2.
>
> 2. Contention by handling packets. HTB is not multi-queue, it attaches
> to a whole net device, and handling of all packets takes the same lock.
> Our solution offloads the logic of HTB to the hardware and registers HTB
> as a multi-queue qdisc, similarly to how mq qdisc does, i.e. HTB is
> attached to the netdev, and each queue has its own qdisc. The control
> flow is performed by HTB, it replicates the hierarchy of classes in
> hardware by calling callbacks of the driver. Leaf classes are presented
> by hardware queues. The data path works as follows: a packet is
> classified by clsact, the driver selectes the hardware queue according
> to its class, and the packet is enqueued into this queue's qdisc.
Are you sure the HTB algorithm could still work even after you
kinda make each HTB class separated? I think they must still share
something when they borrow bandwidth from each other. This is why I
doubt you can simply add a ->attach() without touching the core
algorithm.
Thanks.
Powered by blists - more mailing lists