[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0266202-5db6-123c-eba6-33e5c5c4ba6d@gmail.com>
Date: Fri, 7 Jun 2019 11:22:00 +0900
From: Toshiaki Makita <toshiaki.makita1@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
netdev@...r.kernel.org, xdp-newbies@...r.kernel.org,
bpf@...r.kernel.org,
Toke Høiland-Jørgensen <toke@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Brendan Gregg <brendan.d.gregg@...il.com>
Subject: Re: [PATCH v2 bpf-next 1/2] xdp: Add tracepoint for bulk XDP_TX
On 2019/06/07 4:41, Jesper Dangaard Brouer wrote:
> On Thu, 6 Jun 2019 20:04:20 +0900
> Toshiaki Makita <toshiaki.makita1@...il.com> wrote:
>
>> On 2019/06/05 16:59, Jesper Dangaard Brouer wrote:
>>> On Wed, 5 Jun 2019 14:36:12 +0900
>>> Toshiaki Makita <toshiaki.makita1@...il.com> wrote:
>>>
>>>> This is introduced for admins to check what is happening on XDP_TX when
>>>> bulk XDP_TX is in use, which will be first introduced in veth in next
>>>> commit.
>>>
>>> Is the plan that this tracepoint 'xdp:xdp_bulk_tx' should be used by
>>> all drivers?
>>
>> I guess you mean all drivers that implement similar mechanism should use
>> this? Then yes.
>> (I don't think all drivers needs bulk tx mechanism though)
>>
>>> (more below)
>>>
>>>> Signed-off-by: Toshiaki Makita <toshiaki.makita1@...il.com>
>>>> ---
>>>> include/trace/events/xdp.h | 25 +++++++++++++++++++++++++
>>>> kernel/bpf/core.c | 1 +
>>>> 2 files changed, 26 insertions(+)
>>>>
>>>> diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
>>>> index e95cb86..e06ea65 100644
>>>> --- a/include/trace/events/xdp.h
>>>> +++ b/include/trace/events/xdp.h
>>>> @@ -50,6 +50,31 @@
>>>> __entry->ifindex)
>>>> );
>>>>
>>>> +TRACE_EVENT(xdp_bulk_tx,
>>>> +
>>>> + TP_PROTO(const struct net_device *dev,
>>>> + int sent, int drops, int err),
>>>> +
>>>> + TP_ARGS(dev, sent, drops, err),
>>>> +
>>>> + TP_STRUCT__entry(
>>>
>>> All other tracepoints in this file starts with:
>>>
>>> __field(int, prog_id)
>>> __field(u32, act)
>>> or
>>> __field(int, map_id)
>>> __field(u32, act)
>>>
>>> Could you please add those?
>>
>> So... prog_id is the problem. The program can be changed while we are
>> enqueueing packets to the bulk queue, so the prog_id at flush may be an
>> unexpected one.
>
> Hmmm... that sounds problematic, if the XDP bpf_prog for veth can
> change underneath, before the flush. Our redirect system, depend on
> things being stable until the xdp_do_flush_map() operation, as will
> e.g. set per-CPU (bpf_redirect_info) map_to_flush pointer (which depend
> on XDP prog), and expect it to be correct/valid.
Sorry, I don't get how maps depend on programs.
At least xdp_do_redirect_map() handles map_to_flush change during NAPI.
Is there a problem when the map is not changed but the program is changed?
Also I believe this is not veth-specific behavior. Looking at tun and
i40e, they seem to change xdp_prog without stopping data path.
>> It can be fixed by disabling NAPI when changing XDP programs. This stops
>> packet processing while changing XDP programs, but I guess it is an
>> acceptable compromise. Having said that, I'm honestly not so eager to
>> make this change, since this will require refurbishment of one of the
>> most delicate part of veth XDP, NAPI disabling/enabling mechanism.
>>
>> WDYT?
>
> Sound like a bug, if XDP bpf_prog is not stable within the NAPI poll...
>
>
>>>> + __field(int, ifindex)
>>>> + __field(int, drops)
>>>> + __field(int, sent)
>>>> + __field(int, err)
>>>> + ),
>>>
>>> The reason is that this make is easier to attach to multiple
>>> tracepoints, and extract the same value.
>>>
>>> Example with bpftrace oneliner:
>>>
>>> $ sudo bpftrace -e 'tracepoint:xdp:xdp_* { @action[args->act] = count(); }'
>
Powered by blists - more mailing lists