[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <75228f98-338e-453c-3ace-b6d36b26c51c@redhat.com>
Date: Fri, 20 Dec 2019 11:24:47 +0800
From: Jason Wang <jasowang@...hat.com>
To: Prashant Bhole <prashantbhole.linux@...il.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: "David S . Miller" <davem@...emloft.net>,
"Michael S . Tsirkin" <mst@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
David Ahern <dsahern@...il.com>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
John Fastabend <john.fastabend@...il.com>,
Toshiaki Makita <toshiaki.makita1@...il.com>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
Andrii Nakryiko <andriin@...com>, netdev@...r.kernel.org,
Ilias Apalodimas <ilias.apalodimas@...aro.org>
Subject: Re: [RFC net-next 11/14] tun: run XDP program in tx path
On 2019/12/20 上午8:07, Prashant Bhole wrote:
> Note: Resending my last response. It was not delivered to netdev list
> due to some problem.
>
> On 12/19/19 7:15 PM, Toke Høiland-Jørgensen wrote:
>> Prashant Bhole <prashantbhole.linux@...il.com> writes:
>>
>>> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>>>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen
>>>> wrote:
>>>>> Jesper Dangaard Brouer <jbrouer@...hat.com> writes:
>>>>>
>>>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>>>> Prashant Bhole <prashantbhole.linux@...il.com> wrote:
>>>>>>
>>>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct
>>>>>>> tun_file *tfile,
>>>>>>> + struct xdp_frame *frame)
>>>>>>> +{
>>>>>>> + struct bpf_prog *xdp_prog;
>>>>>>> + struct tun_page tpage;
>>>>>>> + struct xdp_buff xdp;
>>>>>>> + u32 act = XDP_PASS;
>>>>>>> + int flush = 0;
>>>>>>> +
>>>>>>> + xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>>>> + if (xdp_prog) {
>>>>>>> + xdp.data_hard_start = frame->data - frame->headroom;
>>>>>>> + xdp.data = frame->data;
>>>>>>> + xdp.data_end = xdp.data + frame->len;
>>>>>>> + xdp.data_meta = xdp.data - frame->metasize;
>>>>>>
>>>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this
>>>>>> will crash.
>>>>>>
>>>>>> For an XDP TX hook, I want us to provide/give BPF-prog access to
>>>>>> some
>>>>>> more information about e.g. the current tx-queue length, or TC-q
>>>>>> number.
>>>>>>
>>>>>> Question to Daniel or Alexei, can we do this and still keep
>>>>>> BPF_PROG_TYPE_XDP?
>>>>>> Or is it better to introduce a new BPF prog type (enum
>>>>>> bpf_prog_type)
>>>>>> for XDP TX-hook ?
>>>>>
>>>>> I think a new program type would make the most sense. If/when we
>>>>> introduce an XDP TX hook[0], it should have different semantics
>>>>> than the
>>>>> regular XDP hook. I view the XDP TX hook as a hook that executes
>>>>> as the
>>>>> very last thing before packets leave the interface. It should have
>>>>> access to different context data as you say, but also I don't
>>>>> think it
>>>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>>>>> may also want to have a "throttle" return code; or maybe that
>>>>> could be
>>>>> done via a helper?
>>>>>
>>>>> In any case, I don't think this "emulated RX hook on the other end
>>>>> of a
>>>>> virtual device" model that this series introduces is the right
>>>>> semantics
>>>>> for an XDP TX hook. I can see what you're trying to do, and for
>>>>> virtual
>>>>> point-to-point links I think it may make sense to emulate the RX
>>>>> hook of
>>>>> the "other end" on TX. However, form a UAPI perspective, I don't
>>>>> think
>>>>> we should be calling this a TX hook; logically, it's still an RX hook
>>>>> on the receive end.
>>>>>
>>>>> If you guys are up for evolving this design into a "proper" TX
>>>>> hook (as
>>>>> outlined above an in [0]), that would be awesome, of course. But not
>>>>> sure what constraints you have on your original problem? Do you
>>>>> specifically need the "emulated RX hook for unmodified XDP programs"
>>>>> semantics, or could your problem be solved with a TX hook with
>>>>> different
>>>>> semantics?
>>>>
>>>> I agree with above.
>>>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
>>>> of veth/tap interface. I think only attachment point makes a
>>>> difference.
>>>> May be use expected_attach_type ?
>>>> Then there will be no need to create new program type.
>>>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>>>> on expected_attach_type. Like rx-queue length that Jesper is
>>>> suggesting
>>>> will be available only in such case and not for all
>>>> BPF_PROG_TYPE_XDP progs.
>>>> It can be reduced too. Like if there is no xdp.rxq concept for
>>>> egress side
>>>> of virtual device the access to that field can disallowed by the
>>>> verifier.
>>>> Could you also call it XDP_EGRESS instead of XDP_TX?
>>>> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
>>>>
>>>
>>> From the discussion over this set, it makes sense to have new type of
>>> program. As David suggested it will make a way for changes specific
>>> to egress path.
>>> On the other hand, XDP offload with virtio-net implementation is based
>>> on "emulated RX hook". How about having this special behavior with
>>> expected_attach_type?
>>
>> Another thought I had re: this was that for these "special" virtual
>> point-to-point devices we could extend the API to have an ATTACH_PEER
>> flag. So if you have a pair of veth devices (veth0,veth1) connecting to
>> each other, you could do either of:
>>
>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);
>>
>> to attach to veth0, and:
>>
>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);
>>
>> to attach to veth0.
>>
>> This would allow to attach to a device without having the "other end"
>> visible, and keep the "XDP runs on RX" semantics clear to userspace.
>> Internally in the kernel we could then turn the "attach to peer"
>> operation for a tun device into the "emulate on TX" thing you're already
>> doing?
>>
>> Would this work for your use case, do you think?
>>
>> -Toke
>>
>
> This is nice from UAPI point of view. It may work for veth case but
> not for XDP offload with virtio-net. Please see the sequence when
> a user program in the guest wants to offload a program to tun.
>
> * User program wants to loads the program by setting offload flag and
> ifindex:
>
> - map_offload_ops->alloc()
> virtio-net sends map info to qemu and it creates map on the host.
> - prog_offload_ops->setup()
> New callback just to have a copy of unmodified program. It contains
> original map fds. We replace map fds with fds from the host side.
> Check the program for unsupported helpers calls.
> - prog_offload_ops->finalize()
> Send the program to qemu and it loads the program to the host.
>
> * User program calls bpf_set_link_xdp_fd()
> virtio-net handles XDP_PROG_SETUP_HW by sending a request to qemu.
> Qemu then attaches host side program fd to respective tun device by
> calling bpf_set_link_xdp_fd()
>
> In above sequence there is no chance to use.
For VM, I think what Toke meant is to consider virtio-net as a peer of
TAP and we can do something like the following in qemu:
bpf_set_link_xdp_fd(ifindex(tap0), prog_fd, ATTACH_PEER);
in this case. And the behavior of XDP_TX could be kept as if the XDP was
attached to the peer of TAP (actually a virtio-net inside the guest).
Thanks
>
> Here is how other ideas from this discussion can be used:
>
> - Introduce BPF_PROG_TYPE_TX_XDP for egress path. Have a special
> behavior of emulating RX XDP using expected_attach_type flag.
> - The emulated RX XDP will be restrictive in terms of helper calls.
> - In offload case qemu will load the program BPF_PROG_TYPE_TX_XDP and
> set expected_attach_type.
>
> What is your opinion about it? Does the driver implementing egress
> XDP needs to know what kind of XDP program it is running?
>
> Thanks,
> Prashant
>
Powered by blists - more mailing lists