netdev - Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cdd04862-dafd-080e-e90e-5161e568bac3@gmail.com>
Date:   Mon, 18 May 2020 17:52:41 -0600
From:   David Ahern <dsahern@...il.com>
To:     John Fastabend <john.fastabend@...il.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        David Ahern <dsahern@...nel.org>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, kuba@...nel.org,
        prashantbhole.linux@...il.com, brouer@...hat.com,
        daniel@...earbox.net, ast@...nel.org, kafai@...com,
        songliubraving@...com, yhs@...com, andriin@...com,
        David Ahern <dahern@...italocean.com>
Subject: Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path

On 5/18/20 12:10 PM, John Fastabend wrote:
>>
>> host ingress to VM is one use case; VM to VM on the same host is another.
> 
> But host ingress to VM would still work with tail calls because the XDP
> packet came from another XDP program. At least that is how I understand
> it.
> 
> VM to VM case, again using tail calls on the sending VM ingress hook
> would work also.

understood. I realize I can attach the program array all around, I just
see that as complex control plane / performance hit depending on how the
programs are wired up.

>>
>> With respect to lifecycle management of the programs and the data,
>> putting VM specific programs and maps on VM specific taps simplifies
>> management. VM terminates, taps are deleted, programs and maps
>> disappear. So no validator thread needed to handle stray data / programs
>> from the inevitable cleanup problems when everything is lumped into 1
>> program / map or even array of programs and maps.
> 
> OK. Also presumably you already have a hook into this event to insert
> the tc filter programs so its probably a natural hook for mgmt.

For VMs there is no reason to have an skb at all, so no tc filter program.

> 
>>
>> To me the distributed approach is the simplest and best. The program on
>> the host nics can be stupid simple; no packet parsing beyond the
>> ethernet header. It's job is just a traffic demuxer very much like a
>> switch. All VM logic and data is local to the VM's interfaces.
> 
> IMO it seems more natural and efficient to use a tail call. But, I
> can see how if the ingress program is a l2/l3 switch and the VM hook
> is a l2/l3 filter it feels more like a switch+firewall layout we
> would normally use on a "real" (v)switch. Also I think the above point
> where cleanup is free because of the tap tear down is a win.

exactly. To the VM. the host is part of the network. The host should be
passing the packets as fast and as simply as possible from ingress nic
to vm. It can be done completely as xdp frames and doing so reduces the
CPU cycles per packet in the host (yes, there are caveats to that
statement).

VM to host nic, and VM to VM have their own challenges which need to be
tackled next.

But the end goal is to have all VM traffic touched by the host as xdp
frames and without creating a complex control plane. The distributed
approach is much simpler and cleaner - and seems to follow what Cilium
is doing to a degree, or that is my interpretation of

"By attaching to the TC ingress hook of the host side of this veth pair
Cilium can monitor and enforce policy on all traffic exiting a
container. By attaching a BPF program to the veth pair associated with
each container and routing all network traffic to the host side virtual
devices with another BPF program attached to the tc ingress hook as well
Cilium can monitor and enforce policy on all traffic entering or exiting
the node."

https://docs.cilium.io/en/v1.7/architecture/