netdev - Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5ec376ff1680_2e852b10123785b4ae@john-XPS-13-9370.notmuch>
Date:   Mon, 18 May 2020 23:04:47 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     David Ahern <dsahern@...il.com>,
        John Fastabend <john.fastabend@...il.com>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        David Ahern <dsahern@...nel.org>, netdev@...r.kernel.org
Cc:     davem@...emloft.net, kuba@...nel.org,
        prashantbhole.linux@...il.com, brouer@...hat.com,
        daniel@...earbox.net, ast@...nel.org, kafai@...com,
        songliubraving@...com, yhs@...com, andriin@...com,
        David Ahern <dahern@...italocean.com>
Subject: Re: [PATCH v5 bpf-next 00/11] net: Add support for XDP in egress path

David Ahern wrote:
> On 5/18/20 12:10 PM, John Fastabend wrote:
> >>
> >> host ingress to VM is one use case; VM to VM on the same host is another.
> > 
> > But host ingress to VM would still work with tail calls because the XDP
> > packet came from another XDP program. At least that is how I understand
> > it.
> > 
> > VM to VM case, again using tail calls on the sending VM ingress hook
> > would work also.
> 
> understood. I realize I can attach the program array all around, I just
> see that as complex control plane / performance hit depending on how the
> programs are wired up.
> 

Hard to argue with out a specific program. I think it could go either way.
I'll concede the control plane might be more complex but not so convinced
about performance. Either way having a program attached to the life cycle
of the VM seems like something that would be nice to have. In the tc skb
case if we attach to a qdisc it is removed automatically when the device
is removed. Having something similar for xdp is probably a good thing.

Worth following up in Daniel's thread. Another way to do that instead of
having the program associated with the ifindex is to have it associated
with the devmap entry. Basically when we add an entry in the devmap if
we had a program fd associated with it they could both be released when
the devmap entry is removed. This will happen automatically if the ifindex
is removed. But, rather than fragment threads too much I'll wait for
Daniel's reply.

> >>
> >> With respect to lifecycle management of the programs and the data,
> >> putting VM specific programs and maps on VM specific taps simplifies
> >> management. VM terminates, taps are deleted, programs and maps
> >> disappear. So no validator thread needed to handle stray data / programs
> >> from the inevitable cleanup problems when everything is lumped into 1
> >> program / map or even array of programs and maps.
> > 
> > OK. Also presumably you already have a hook into this event to insert
> > the tc filter programs so its probably a natural hook for mgmt.
> 
> For VMs there is no reason to have an skb at all, so no tc filter program.

+1 nice win for sure.

> 
> > 
> >>
> >> To me the distributed approach is the simplest and best. The program on
> >> the host nics can be stupid simple; no packet parsing beyond the
> >> ethernet header. It's job is just a traffic demuxer very much like a
> >> switch. All VM logic and data is local to the VM's interfaces.
> > 
> > IMO it seems more natural and efficient to use a tail call. But, I
> > can see how if the ingress program is a l2/l3 switch and the VM hook
> > is a l2/l3 filter it feels more like a switch+firewall layout we
> > would normally use on a "real" (v)switch. Also I think the above point
> > where cleanup is free because of the tap tear down is a win.
> 
> exactly. To the VM. the host is part of the network. The host should be
> passing the packets as fast and as simply as possible from ingress nic
> to vm. It can be done completely as xdp frames and doing so reduces the
> CPU cycles per packet in the host (yes, there are caveats to that
> statement).
> 
> VM to host nic, and VM to VM have their own challenges which need to be
> tackled next.
> 
> But the end goal is to have all VM traffic touched by the host as xdp
> frames and without creating a complex control plane. The distributed
> approach is much simpler and cleaner - and seems to follow what Cilium
> is doing to a degree, or that is my interpretation of

+1 agree everything as xdp pkt is a great goal.

> 
> "By attaching to the TC ingress hook of the host side of this veth pair
> Cilium can monitor and enforce policy on all traffic exiting a
> container. By attaching a BPF program to the veth pair associated with
> each container and routing all network traffic to the host side virtual
> devices with another BPF program attached to the tc ingress hook as well
> Cilium can monitor and enforce policy on all traffic entering or exiting
> the node."
> 
> https://docs.cilium.io/en/v1.7/architecture/

In many configurations there are no egress hooks though because policy (the
firewall piece) is implemented as part of the ingress hook. Because the
ingress TC hook "knows" where it will redirect a packet it can also run
the policy logic for that pod/VM/etc.