lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 15 Feb 2022 19:17:36 -0500
From:   Jamal Hadi Salim <jhs@...atatu.com>
To:     Tonghao Zhang <xiangxia.m.yue@...il.com>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Alexander Lobakin <alobakin@...me>,
        Paolo Abeni <pabeni@...hat.com>,
        Talal Ahmad <talalahmad@...gle.com>,
        Kevin Hao <haokexin@...il.com>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        Kees Cook <keescook@...omium.org>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Antoine Tenart <atenart@...nel.org>,
        Wei Wang <weiwan@...gle.com>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [net-next v8 2/2] net: sched: support hash/classid/cpuid
 selecting tx queue

On 2022-02-14 20:40, Tonghao Zhang wrote:
> On Tue, Feb 15, 2022 at 8:22 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>>
>> On 2022-01-26 09:32, xiangxia.m.yue@...il.com wrote:
>>> From: Tonghao Zhang <xiangxia.m.yue@...il.com>
>>>

>
>> So while i dont agree that ebpf is the solution for reasons i mentioned
>> earlier - after looking at the details think iam confused by this change
>> and maybe i didnt fully understand the use case.
>>
>> What is the driver that would work  with this?
>> You said earlier packets are coming out of some pods and then heading to
>> the wire and you are looking to balance and isolate between bulk and
>> latency  sensitive traffic - how are any of these metadatum useful for
>> that? skb->priority seems more natural for that.

Quote from your other email:

 > In our production env, we use the ixgbe, i40e and mlx nic which
 > support multi tx queue.

Please bear with me.
The part i was wondering about is how these drivers would use queue
mapping to select their hardware queues.
Maybe you meant the software queue (in the qdiscs?) - But even then
how does queue mapping map select which queue is to be used.

> Hi
> I try to explain. there are two tx-queue range, e.g. A(Q0-Qn), B(Qn+1-Qm).
> A is used for latency sensitive traffic. B is used for bulk sensitive
> traffic. A may be shared by Pods/Containers which key is
> high throughput. B may be shared by Pods/Containers which key is low
> latency. So we can do the balance in range A for latency sensitive
> traffic.

So far makes sense. I am not sure if you get better performance but
thats unrelated to this discussion. Just trying to understand your
setup  first in order to understand the use case. IIUC:
You have packets coming out of the pods and hitting the host stack
where you are applying these rules on egress qdisc of one of these
ixgbe, i40e and mlx nics, correct?
And that egress qdisc then ends up selecting a queue based on queue
mapping?

Can you paste a more complete example of a sample setup on some egress
port including what the classifier would be looking at?
Your diagram was unclear how the load balancing was going to be
achieved using the qdiscs (or was it the hardware?).

> So we can use the skb->hash or CPUID or classid to classify the
> packets in range A or B. The balance policies are used for different
> use case.
> For skb->hash, the packets from Pods/Containers will share the range.
> Should to know that one Pod/Container may use the multi TCP/UDP flows.
> That flows share the tx queue range.
> For CPUID, while Pod/Container use the multi flows, pod pinned on one
> CPU will use one tx-queue in range A or B.
> For CLASSID, the Pod may contain the multi containters.
> 
> skb->priority may be used by applications. we can't require
> application developer to change them.

It can also be set by skbedit.
Note also: Other than user specifying via setsockopt and skbedit,
DSCP/TOS/COS are all translated into skb->priority. Most of those
L3/L2 fields are intended to map to either bulk or latency sensitive
traffic.
More importantly:
 From s/w level - most if not _all_ classful qdiscs look at skb->priority
to decide where to enqueue.
 From h/w level - skb->priority is typically mapped to qos hardware level
(example 802.1q).
Infact skb->priority could be translated by qdisc layer into
classid if you set the 32 bit value to be the major:minor number for
a specific configured classid.

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ