lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 10 Jan 2022 15:52:41 +0900
From:   Toshiaki Makita <>
To:     Daniel Borkmann <>
Subject: Re: [PATCH net-next] veth: Do not record rx queue hint in veth_xmit

On 2022/01/07 22:56, Daniel Borkmann wrote:
> On 1/6/22 1:57 PM, Toshiaki Makita wrote:
>> On 2022/01/06 9:46, Daniel Borkmann wrote:
>> Thank you for the fix.
>>> (unless configured in non-default manner).
>> Just curious, is there any special configuration to record rx queue index on veth 
>> after your patch?
> Right now skb->queue_mapping represents the tx queue number. So assuming
> dev->real_num_tx_queues == dev->real_num_rx_queues for the veth creation,
> then veth_xmit() picks the rx queue via rcv_priv->rq[rxq] based on tx queue
> number. So, by default veth is created with dev->real_num_tx_queues ==
> dev->real_num_rx_queues == 1, in which case the queue_mapping stays 0.
> Checking with e.g. ...
>    ip link add foo numtxqueues 64 numrxqueues 64 type veth peer numtxqueues 64 
> numrxqueues 64
> ... then stack inside the netns on the veth dev picks a TX queue via
> netdev_core_pick_tx(). Due to the skb_record_rx_queue() / skb_get_rx_queue()
> it is off by one in generic XDP [in hostns] at bpf_prog_run_generic_xdp() for
> the case where veth has more than single queue (Single queue case netif_get_rxqueue()
> will see that skb_rx_queue_recorded() is false and just pick the first queue
> so at least there it's correct).
> Not sure if there is a good way to detect inside bpf_prog_run_generic_xdp()
> that skb was looped to host from netns and then fix up the offset .. other
> option could potentially be an extra device parameter and when enabled only
> then do the skb_record_rx_queue() so it's an explicit opt-in where admin needs
> to be aware of potential implications.
> My worry is that if the skb ends up being pushed out the phys dev, then 1)
> if veth was provisioned with >1 TX queues the admin must also take into
> account the TX queues e.g. on phys devices like ena, so that we're not under-
> utilizing it (or have something like BPF clear the queue mapping before
> it's forwarded). And if we do skb_record_rx_queue() and, say, TX queue numbers
> of veth and phys dev are provisioned to be the same, then we jump +1 on the
> phys dev with a skb_record_rx_queue() which may be unintentional. Hence maybe
> explicit opt-in could be better option?

Thank you for the detailed explanation!
I agree with you on that an opt-in would be better.
I thought you might mean there is already some way to do that by "configured in
non-default manner", so I asked it. Now I understand we need an additional knob
if we really want to extract the veth rx queue number from bpf.

Toshiaki Makita

Powered by blists - more mailing lists