netdev - Re: Veth pair swallow packets for XDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d2dffea2-13d9-72a2-a89c-354b6403da54@gmail.com>
Date:   Fri, 17 Jan 2020 15:00:53 +0900
From:   Toshiaki Makita <toshiaki.makita1@...il.com>
To:     Hanlin Shi <hanlins@...are.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc:     Cheng-Chun William Tu <tuc@...are.com>
Subject: Re: Veth pair swallow packets for XDP_TX operation

Please avoid top-posting in netdev mailing list.

On 2020/01/17 7:54, Hanlin Shi wrote:
> Hi Toshiaki,
> 
> Thanks for your advice, and now it's working as expected in my environment. However I still have concerns on this issue. Is this dummy interface approach is a short-term work around?

This is a long-standing problem and should be fixed in some way. But not easy.

Your packets were dropped because the peer device did not prepare necessary
resources to receive XDP frames. The resource allocation is triggered by
attaching (possibly dummy) XDP program, which is unfortunately unintuitive.
Typically this kind of problem happens when other devices redirect frames by
XDP_REDIRECT to some device. If the redirect target device has not prepared
necessary resources, redirected frames will be dropped. This is a common issue
with other XDP drivers and netdev community is seeking for a right solution.

For veth there may be one more option that attaching an XDP program triggers
allocation of "peer" resource. But this means we need to allocate resources
on both ends when only either of them attaches XDP. This is not necessary when the
program only does XDP_DROP or XDP_PASS, so I'm not sure this is a good idea.

Anyway with current behavior the peer (i.e. container host) needs to explicitly
allow XDP_TX by attaching some program on host side.

> The behavior for native XDP is different from generic XDP, which could cause confusions for developers. 

Native XDP is generally hard to setup, which is one of reasons why generic XDP was introduced.

> Also, I'm planning to load the XDP program in container (specifically, Kubernetes pod), and I'm not sure is it's feasible for me to access the veth peer that is connected to the bridge (Linux bridge or ovs).

So veth devices will be created by CNI plugins? Then basically your CNI plugin needs to attach
XDP program on host side if you want to allow XDP_TX in containers.

> 
> I wonder is that ok to have a fix, that if the XDP program on the peer of veth is not found, then fallback to a dummy XDP_PASS behavior, just like what you demonstrated? If needed I can help on the fix.

I proposed a similar workaround when I introduced veth native XDP, but rejected.
If we do not allocate additional resources on the peer, we need to use legacy data path
that does not have bulk interface, which makes the XDP_TX performance lower.
That would be a hard-to-fix problem than dropping...

Toshiaki Makita