lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae876912-dda3-057c-ac29-c472ce94a8d0@mojatatu.com>
Date:   Mon, 14 Mar 2022 12:38:10 -0400
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     xiangxia.m.yue@...il.com, netdev@...r.kernel.org
Cc:     Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Alexander Lobakin <alobakin@...me>,
        Paolo Abeni <pabeni@...hat.com>,
        Talal Ahmad <talalahmad@...gle.com>,
        Kevin Hao <haokexin@...il.com>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Kees Cook <keescook@...omium.org>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Antoine Tenart <atenart@...nel.org>,
        Wei Wang <weiwan@...gle.com>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [net-next v10 1/2] net: sched: use queue_mapping to pick tx queue

On 2022-03-14 10:15, xiangxia.m.yue@...il.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@...il.com>
> 
> This patch fixes issue:
> * If we install tc filters with act_skbedit in clsact hook.
>    It doesn't work, because netdev_core_pick_tx() overwrites
>    queue_mapping.
> 
>    $ tc filter ... action skbedit queue_mapping 1
> 
> And this patch is useful:
> * We can use FQ + EDT to implement efficient policies. Tx queues
>    are picked by xps, ndo_select_queue of netdev driver, or skb hash
>    in netdev_core_pick_tx(). In fact, the netdev driver, and skb
>    hash are _not_ under control. xps uses the CPUs map to select Tx
>    queues, but we can't figure out which task_struct of pod/containter
>    running on this cpu in most case. We can use clsact filters to classify
>    one pod/container traffic to one Tx queue. Why ?
> 
>    In containter networking environment, there are two kinds of pod/
>    containter/net-namespace. One kind (e.g. P1, P2), the high throughput
>    is key in these applications. But avoid running out of network resource,
>    the outbound traffic of these pods is limited, using or sharing one
>    dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods
>    (e.g. Pn), the low latency of data access is key. And the traffic is not
>    limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc.
>    This choice provides two benefits. First, contention on the HTB/FQ Qdisc
>    lock is significantly reduced since fewer CPUs contend for the same queue.
>    More importantly, Qdisc contention can be eliminated completely if each
>    CPU has its own FIFO Qdisc for the second kind of pods.
> 
>    There must be a mechanism in place to support classifying traffic based on
>    pods/container to different Tx queues. Note that clsact is outside of Qdisc
>    while Qdisc can run a classifier to select a sub-queue under the lock.
> 
>    In general recording the decision in the skb seems a little heavy handed.
>    This patch introduces a per-CPU variable, suggested by Eric.
> 
>    The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit().
>    - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag
>      is set in qdisc->enqueue() though tx queue has been selected in
>      netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared
>      firstly in __dev_queue_xmit(), is useful:
>    - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev
>      in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy:
>      For example, eth0, macvlan in pod, which root Qdisc install skbedit
>      queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of
>      eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping
>      because there is no filters in clsact or tx Qdisc of this netdev.
>      Same action taked in eth0, ixgbe in Host.
>    - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue
>      in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it
>      in __dev_queue_xmit when processing next packets.
> 
>    For performance reasons, use the static key. If user does not config the NET_EGRESS,
>    the patch will not be compiled.
> 
>    +----+      +----+      +----+
>    | P1 |      | P2 |      | Pn |
>    +----+      +----+      +----+
>      |           |           |
>      +-----------+-----------+
>                  |
>                  | clsact/skbedit
>                  |      MQ
>                  v
>      +-----------+-----------+
>      | q0        | q1        | qn
>      v           v           v
>    HTB/FQ      HTB/FQ  ...  FIFO
> 
> Cc: Jamal Hadi Salim <jhs@...atatu.com>
> Cc: Cong Wang <xiyou.wangcong@...il.com>
> Cc: Jiri Pirko <jiri@...nulli.us>
> Cc: "David S. Miller" <davem@...emloft.net>
> Cc: Jakub Kicinski <kuba@...nel.org>
> Cc: Jonathan Lemon <jonathan.lemon@...il.com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: Alexander Lobakin <alobakin@...me>
> Cc: Paolo Abeni <pabeni@...hat.com>
> Cc: Talal Ahmad <talalahmad@...gle.com>
> Cc: Kevin Hao <haokexin@...il.com>
> Cc: Ilias Apalodimas <ilias.apalodimas@...aro.org>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Kumar Kartikeya Dwivedi <memxor@...il.com>
> Cc: Antoine Tenart <atenart@...nel.org>
> Cc: Wei Wang <weiwan@...gle.com>
> Cc: Arnd Bergmann <arnd@...db.de>
> Suggested-by: Eric Dumazet <edumazet@...gle.com>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@...il.com>

Acked-by: Jamal Hadi Salim <jhs@...atatu.com>

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ