lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 01 Sep 2023 15:32:42 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, netdev@...r.kernel.org
Cc: hawk@...nel.org, pabeni@...hat.com, kuba@...nel.org,
 davem@...emloft.net, lorenzo@...nel.org, Ilias Apalodimas
 <ilias.apalodimas@...aro.org>, mtahhan@...hat.com,
 huangjie.albert@...edance.com, Yunsheng Lin <linyunsheng@...wei.com>,
 edumazet@...gle.com, Liang Chen <liangchen.linux@...il.com>
Subject: Re: [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions
 when dealing with SKBs

Jesper Dangaard Brouer <hawk@...nel.org> writes:

> On 24/08/2023 12.30, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <hawk@...nel.org> writes:
>> 
>>> The root-cause the realloc issue is that veth_xdp_rcv_skb() code path (that
>>> handles SKBs like generic-XDP) is calling a native-XDP function
>>> xdp_do_redirect(), instead of simply using xdp_do_generic_redirect() that can
>>> handle SKBs.
>>>
>>> The existing code tries to steal the packet-data from the SKB (and frees the SKB
>>> itself). This cause issues as SKBs can have different memory models that are
>>> incompatible with native-XDP call xdp_do_redirect(). For this reason the checks
>>> in veth_convert_skb_to_xdp_buff() becomes more strict. This in turn makes this a
>>> bad approach. Simply leveraging generic-XDP helpers e.g. generic_xdp_tx() and
>>> xdp_do_generic_redirect() as this resolves the issue given netstack can handle
>>> these different SKB memory models.
>> 
>> While this does solve the memory issue, it's also a subtle change of
>> semantics. For one thing, generic_xdp_tx() has this comment above it:
>> 
>> /* When doing generic XDP we have to bypass the qdisc layer and the
>>   * network taps in order to match in-driver-XDP behavior. This also means
>>   * that XDP packets are able to starve other packets going through a qdisc,
>>   * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
>>   * queues, so they do not have this starvation issue.
>>   */
>> 
>> Also, more generally, this means that if you have a setup with
>> XDP_REDIRECT-based forwarding in on a host with a mix of physical and
>> veth devices, all the traffic originating from the veth devices will go
>> on different TXQs than that originating from a physical NIC. Or if a
>> veth device has a mix of xdp_frame-backed packets and skb-backed
>> packets, those will also go on different queues, potentially leading to
>> reordering.
>> 
>
> Mixing xdp_frame-backed packets and skb-backed packet (towards veth)
> will naturally come from two different data paths, and the BPF-developer
> that redirected the xdp_frame (into veth) will have taken this choice,
> including the chance of reordering (given the two data/code paths).

I'm not sure we can quite conclude that this is a choice any XDP
developers will be actively aware of. At best it's a very implicit
choice :)

> I will claim that (for SKBs) current code cause reordering on TXQs (as
> you explain), and my code changes actually fix this problem.
>
> Consider a userspace app (inside namespace) sending packets out (to veth
> peer).  Routing (or bridging) will make netstack send out device A
> (maybe a physical device).  On veth peer we have XDP-prog running, that
> will XDP-redirect every 2nd packet to device A.  With current code TXQ
> reordering will occur, as calling "native" xdp_do_redirect() will select
> TXQ based on current-running CPU, while normal SKBs will use
> netdev_core_pick_tx().  After my change, using
> xdp_do_generic_redirect(), the code end-up using generic_xdp_tx() which
> (looking at the code) also use netdev_core_pick_tx() to select the TXQ.
> Thus, I will claim it is more correct (even-though XDP in general
> doesn't give this guarantee).
>
>> I'm not sure exactly how much of an issue this is in practice, but at
>> least from a conceptual PoV it's a change in behaviour that I don't
>> think we should be making lightly. WDYT?
>
> As desc above, I think this patchset is an improvement.  It might even
> fix/address the concern that was raised.

Well, you can obviously construct examples in both direction (i.e.,
where the old behaviour leads to reordering but the new one doesn't, and
vice versa). I believe you could also reasonably argue that either
behaviour is more "correct", so if we were just picking between
behaviours I wouldn't be objecting, I think.

However, we're not just picking between two equally good behaviours,
we're changing one long-standing behaviour to a different one, and I
worry this will introduce regressions because there are applications
that (explicitly or implicitly) rely on the old behaviour.

Also, there's the starvation issue mentioned in the comment I quoted
above: with this patch it is possible for traffic redirected from a veth
to effectively starve the host TXQ, where before it wouldn't.

I don't really have a good answer for how we can make sure of this
either way, but I believe it's cause for concern, which is really my
main reservation with this change :)

-Toke


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ