netdev - Re: [PATCH net-next] veth: Do not record rx queue hint in veth

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3927bf4a-0034-a985-6899-d50b7eb8a8a5@iogearbox.net>
Date:   Fri, 7 Jan 2022 14:56:11 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Toshiaki Makita <toshiaki.makita1@...il.com>
Cc:     netdev@...r.kernel.org, laurent.bernaille@...adoghq.com,
        maciej.fijalkowski@...el.com, eric.dumazet@...il.com,
        pabeni@...hat.com, john.fastabend@...il.com, willemb@...gle.com,
        davem@...emloft.net, kuba@...nel.org, magnus.karlsson@...il.com
Subject: Re: [PATCH net-next] veth: Do not record rx queue hint in veth_xmit

On 1/6/22 1:57 PM, Toshiaki Makita wrote:
> On 2022/01/06 9:46, Daniel Borkmann wrote:
> 
> Thank you for the fix.
> 
>> (unless configured in non-default manner).
> 
> Just curious, is there any special configuration to record rx queue index on veth after your patch?

Right now skb->queue_mapping represents the tx queue number. So assuming
dev->real_num_tx_queues == dev->real_num_rx_queues for the veth creation,
then veth_xmit() picks the rx queue via rcv_priv->rq[rxq] based on tx queue
number. So, by default veth is created with dev->real_num_tx_queues ==
dev->real_num_rx_queues == 1, in which case the queue_mapping stays 0.
Checking with e.g. ...

   ip link add foo numtxqueues 64 numrxqueues 64 type veth peer numtxqueues 64 numrxqueues 64

... then stack inside the netns on the veth dev picks a TX queue via
netdev_core_pick_tx(). Due to the skb_record_rx_queue() / skb_get_rx_queue()
it is off by one in generic XDP [in hostns] at bpf_prog_run_generic_xdp() for
the case where veth has more than single queue (Single queue case netif_get_rxqueue()
will see that skb_rx_queue_recorded() is false and just pick the first queue
so at least there it's correct).

Not sure if there is a good way to detect inside bpf_prog_run_generic_xdp()
that skb was looped to host from netns and then fix up the offset .. other
option could potentially be an extra device parameter and when enabled only
then do the skb_record_rx_queue() so it's an explicit opt-in where admin needs
to be aware of potential implications.

My worry is that if the skb ends up being pushed out the phys dev, then 1)
if veth was provisioned with >1 TX queues the admin must also take into
account the TX queues e.g. on phys devices like ena, so that we're not under-
utilizing it (or have something like BPF clear the queue mapping before
it's forwarded). And if we do skb_record_rx_queue() and, say, TX queue numbers
of veth and phys dev are provisioned to be the same, then we jump +1 on the
phys dev with a skb_record_rx_queue() which may be unintentional. Hence maybe
explicit opt-in could be better option?

Thanks,
Daniel