netdev - Re: [PATCH net-next] veth: Do not record rx queue hint in veth

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <cd9631c7-ad91-b232-8c98-6d163582288a@gmail.com>
Date:   Mon, 10 Jan 2022 15:52:41 +0900
From:   Toshiaki Makita <toshiaki.makita1@...il.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     netdev@...r.kernel.org, laurent.bernaille@...adoghq.com,
        maciej.fijalkowski@...el.com, eric.dumazet@...il.com,
        pabeni@...hat.com, john.fastabend@...il.com, willemb@...gle.com,
        davem@...emloft.net, kuba@...nel.org, magnus.karlsson@...il.com
Subject: Re: [PATCH net-next] veth: Do not record rx queue hint in veth_xmit

On 2022/01/07 22:56, Daniel Borkmann wrote:
> On 1/6/22 1:57 PM, Toshiaki Makita wrote:
>> On 2022/01/06 9:46, Daniel Borkmann wrote:
>>
>> Thank you for the fix.
>>
>>> (unless configured in non-default manner).
>>
>> Just curious, is there any special configuration to record rx queue index on veth 
>> after your patch?
> 
> Right now skb->queue_mapping represents the tx queue number. So assuming
> dev->real_num_tx_queues == dev->real_num_rx_queues for the veth creation,
> then veth_xmit() picks the rx queue via rcv_priv->rq[rxq] based on tx queue
> number. So, by default veth is created with dev->real_num_tx_queues ==
> dev->real_num_rx_queues == 1, in which case the queue_mapping stays 0.
> Checking with e.g. ...
> 
>    ip link add foo numtxqueues 64 numrxqueues 64 type veth peer numtxqueues 64 
> numrxqueues 64
> 
> ... then stack inside the netns on the veth dev picks a TX queue via
> netdev_core_pick_tx(). Due to the skb_record_rx_queue() / skb_get_rx_queue()
> it is off by one in generic XDP [in hostns] at bpf_prog_run_generic_xdp() for
> the case where veth has more than single queue (Single queue case netif_get_rxqueue()
> will see that skb_rx_queue_recorded() is false and just pick the first queue
> so at least there it's correct).
> 
> Not sure if there is a good way to detect inside bpf_prog_run_generic_xdp()
> that skb was looped to host from netns and then fix up the offset .. other
> option could potentially be an extra device parameter and when enabled only
> then do the skb_record_rx_queue() so it's an explicit opt-in where admin needs
> to be aware of potential implications.
> 
> My worry is that if the skb ends up being pushed out the phys dev, then 1)
> if veth was provisioned with >1 TX queues the admin must also take into
> account the TX queues e.g. on phys devices like ena, so that we're not under-
> utilizing it (or have something like BPF clear the queue mapping before
> it's forwarded). And if we do skb_record_rx_queue() and, say, TX queue numbers
> of veth and phys dev are provisioned to be the same, then we jump +1 on the
> phys dev with a skb_record_rx_queue() which may be unintentional. Hence maybe
> explicit opt-in could be better option?

Thank you for the detailed explanation!
I agree with you on that an opt-in would be better.
I thought you might mean there is already some way to do that by "configured in
non-default manner", so I asked it. Now I understand we need an additional knob
if we really want to extract the veth rx queue number from bpf.

Toshiaki Makita