[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180403170624.ojofd737zh5pul5m@ast-mbp.dhcp.thefacebook.com>
Date: Tue, 3 Apr 2018 10:06:26 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: David Ahern <dsahern@...il.com>
Cc: John Fastabend <john.fastabend@...il.com>,
"Md. Islam" <mislam4@...t.edu>, netdev@...r.kernel.org,
David Miller <davem@...emloft.net>, stephen@...workplumber.org,
agaceph@...il.com, Pavel Emelyanov <xemul@...nvz.org>,
Eric Dumazet <edumazet@...gle.com>, brouer@...hat.com
Subject: Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel
On Tue, Apr 03, 2018 at 11:00:20AM -0600, David Ahern wrote:
> On 4/3/18 10:41 AM, John Fastabend wrote:
> > On 04/03/2018 08:07 AM, David Ahern wrote:
> >> On 4/2/18 12:16 PM, Alexei Starovoitov wrote:
> >>> On Mon, Apr 02, 2018 at 12:09:44PM -0600, David Ahern wrote:
> >>>> On 4/2/18 12:03 PM, John Fastabend wrote:
> >>>>>
> >>>>> Can the above be a normal BPF helper that returns an
> >>>>> ifindex? Then something roughly like this patter would
> >>>>> work for all drivers with redirect support,
> >>>>>
> >>>>>
> >>>>> route_ifindex = ip_route_lookup(__daddr, ....)
> >>>>> if (!route_ifindex)
> >>>>> return do_foo()
> >>>>> return xdp_redirect(route_ifindex);
> >>>>>
> >>>>> So my suggestion is,
> >>>>>
> >>>>> 1. enable veth xdp (including redirect support)
> >>>>> 2. add a helper to lookup route from routing table
> >>>>>
> >>>>> Alternatively you can skip step (2) and encode the routing
> >>>>> table in BPF directly. Maybe we need a more efficient data
> >>>>> structure but that should also work.
> >>>>>
> >>>>
> >>>> That's what I have here:
> >>>>
> >>>> https://github.com/dsahern/linux/commit/bab42f158c0925339f7519df7fb2cde8eac33aa8
> >>>
> >>> was wondering what's up with the delay and when are you going to
> >>> submit them officially...
> >>> The use case came up several times.
> >>>
> >>
> >> I need to find time to come back to that set. As I recall there a number
> >> of outstanding issues:
> >>
> >> 1. you and Daniel had comments about the bpf_func_proto declarations
> >>
> >> 2. Jesper had concerns about xdp redirect to any netdev. e.g., How does
> >> the lookup know the egress netdev supports xdp? Right now you can try
> >> and the packet is dropped if it is not supported.
> >>
> >
> > There should probably be a tracepoint there if not already. Otherwise
> > I think the orchestration/loader layer should be ensuring that xdp
> > support is sufficient. I don't think we need anything specific in the
> > XDP/BPF code to handle unsupported devices.
>
> ok. I am fine with starting with that.
>
> >
> >> 3. VLAN devices. I suspect these will affect the final bpf function
> >> prototype. It would awkward to have 1 forwarding API for non-vlan
> >> devices and a second for vlan devices, hence the need to resolve this
> >> before it goes in.
> >>
> >
> > Interesting. Do we need stacked XDP, I could imagine having 802.1Q
> > simply call the lower dev XDP xmit routine. Possibly adding the 8021q
> > header first.
> >
> > Or alternatively a new dev type could let users query things like
> > vlan-id from the dev rather than automatically doing the tagging. I
> > suspect though if you forward to a vlan device automatically adding
> > the tag is the correct behavior.
> >
> >
> >> 4. What about other stacked devices - bonds and bridges - will those
> >> just work with the bpf helper? VRF is already handled of course. ;-)
> >>
> >
> > So if we simply handle this like other stacked devices and call the
> > lower devs xdp_xmit routine we should get reasonable behavior. For
> > bonds and bridges I guess some generalization is needed though because
> > everything at the moment is skb centric. I don't think its necessary
> > in the first series though. It can be added later.
> >
>
> For 3 and 4 above I was referring to the route lookup part of it; sorry
> for not being clear.
>
> For example, eth1 is enslaved to bond1 which is in VRF red. The lookup
> needs to go to the table associated with the VRF. That is not known by
> just looking at eth1. The code exists to walk the upper layers and do
> the effective translations, just need to cover those cases.
>
> The VLAN part of it is a bit more difficult - ingress device for the
> lookup should be eth1.100 for example not eth1, and then if eth1.100 is
> enslaved to a VRF, ...
>
> None of it is that complex, just need to walk through the various use
> cases and make sure bpf_ipv4_fwd_lookup and bpf_ipv6_fwd_lookup can do
> the right thing for these common use cases.
I'm a bit lost here. Why this is a concern?
'index' as argument that bpf prog is passing into the helper.
The clsbpf program may choose to pass ifindex of the netdev
it's attached to or some other one.
In your patch you have:
+BPF_CALL_3(bpf_ipv4_fwd_lookup, int, index, const struct iphdr *, iph,
+ struct ethhdr *, eth)
+{
+ struct flowi4 fl4 = {
+ .daddr = iph->daddr,
+ .saddr = iph->saddr,
+ .flowi4_iif = index,
+ .flowi4_tos = iph->tos & IPTOS_RT_MASK,
+ .flowi4_scope = RT_SCOPE_UNIVERSE,
+ };
As you saying there is concern with .flowi4_iif = index line ?
In the above the only thing Daniel and myself pointed out that
passing struct iphdr * like this is not safe.
We either need size argument which would be a bit cumbersome or
extend verifier a little to specify size as part of helper proto,
so that verifier can eforce it without having program to pass it.
imo that's the only bit missing from that patch to upstream it.
Also the helper isn't really related to XDP. It should work as-is
for clsbpf and xdp programs as far as I can tell.
Powered by blists - more mailing lists