lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 16 Dec 2020 11:01:34 -0800
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Maxim Mikityanskiy <maximmi@...dia.com>
Cc:     Maxim Mikityanskiy <maximmi@...lanox.com>,
        "David S. Miller" <davem@...emloft.net>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Saeed Mahameed <saeedm@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...dia.com>,
        Yossi Kuperman <yossiku@...dia.com>
Subject: Re: [PATCH net-next v2 2/4] sch_htb: Hierarchical QoS hardware offload

On Mon, Dec 14, 2020 at 12:30 PM Maxim Mikityanskiy <maximmi@...dia.com> wrote:
>
> On 2020-12-14 21:35, Cong Wang wrote:
> > On Mon, Dec 14, 2020 at 7:13 AM Maxim Mikityanskiy <maximmi@...dia.com> wrote:
> >>
> >> On 2020-12-11 21:16, Cong Wang wrote:
> >>> On Fri, Dec 11, 2020 at 7:26 AM Maxim Mikityanskiy <maximmi@...lanox.com> wrote:
> >>>>
> >>>> HTB doesn't scale well because of contention on a single lock, and it
> >>>> also consumes CPU. This patch adds support for offloading HTB to
> >>>> hardware that supports hierarchical rate limiting.
> >>>>
> >>>> This solution addresses two main problems of scaling HTB:
> >>>>
> >>>> 1. Contention by flow classification. Currently the filters are attached
> >>>> to the HTB instance as follows:
> >>>
> >>> I do not think this is the reason, tcf_classify() has been called with RCU
> >>> only on the ingress side for a rather long time. What contentions are you
> >>> talking about here?
> >>
> >> When one attaches filters to HTB, tcf_classify is called from
> >> htb_classify, which is called from htb_enqueue, which is called with the
> >> root spinlock of the qdisc taken.
> >
> > So it has nothing to do with tcf_classify() itself... :-/
> >
> > [...]
> >
> >>> And doesn't TBF already work with mq? I mean you can attach it as
> >>> a leaf to each mq so that the tree lock will not be shared either, but you'd
> >>> lose the benefits of a global rate limit too.
> >>
> >> Yes, I'd lose not only the global rate limit, but also multi-level
> >> hierarchical limits, which are all provided by this HTB offload - that's
> >> why TBF is not really a replacement for this feature.
> >
> > Interesting, please explain how your HTB offload still has a global rate
> > limit and borrowing across queues?
>
> Sure, I will explain that.
>
> > I simply can't see it, all I can see
> > is you offload HTB into each queue in ->attach(),
>
> In the non-offload mode, the same HTB instance would be attached to all
> queues. In the offload mode, HTB behaves like MQ: there is a root
> instance of HTB, but each queue gets a separate simple qdisc (pfifo).
> Only the root qdisc (HTB) gets offloaded, and when that happens, the NIC
> creates an object for the QoS root.

Please add this to your changelog.

And why is the offloaded root qdisc not visible to software? All you add to
root HTB are pointers of direct qdisc's and a boolean, this is what I meant
by "not reflected". I expect the hardware parameters/stats are exposed to
software too, but I can't find any.

>
> Then all configuration changes are sent to the driver, and it issues the
> corresponding firmware commands to replicate the whole hierarchy in the
> NIC. Leaf classes correspond to queue groups (in this implementation
> queue groups contain only one queue, but it can be extended), and inner
> classes correspond to entities called TSARs.
>
> The information about rate limits is stored inside TSARs and queue
> groups. Queues know what groups they belong to, and groups and TSARs
> know what TSAR is their parent. A queue is picked in ndo_select_queue by
> looking at the classification result of clsact. So, when a packet is put
> onto a queue, the NIC can track the whole hierarchy and do the HTB
> algorithm.

Glad to know hardware still keeps HTB as a hierarchy.

Please also add this either to source code as comments or in your
changelog, it is very important to understand what is done by hardware.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ