netdev - Re: [PATCH v2 bpf-next 1/2] xdp: Add tracepoint for bulk XDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e0266202-5db6-123c-eba6-33e5c5c4ba6d@gmail.com>
Date:   Fri, 7 Jun 2019 11:22:00 +0900
From:   Toshiaki Makita <toshiaki.makita1@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        netdev@...r.kernel.org, xdp-newbies@...r.kernel.org,
        bpf@...r.kernel.org,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Brendan Gregg <brendan.d.gregg@...il.com>
Subject: Re: [PATCH v2 bpf-next 1/2] xdp: Add tracepoint for bulk XDP_TX

On 2019/06/07 4:41, Jesper Dangaard Brouer wrote:
> On Thu, 6 Jun 2019 20:04:20 +0900
> Toshiaki Makita <toshiaki.makita1@...il.com> wrote:
> 
>> On 2019/06/05 16:59, Jesper Dangaard Brouer wrote:
>>> On Wed,  5 Jun 2019 14:36:12 +0900
>>> Toshiaki Makita <toshiaki.makita1@...il.com> wrote:
>>>    
>>>> This is introduced for admins to check what is happening on XDP_TX when
>>>> bulk XDP_TX is in use, which will be first introduced in veth in next
>>>> commit.
>>>
>>> Is the plan that this tracepoint 'xdp:xdp_bulk_tx' should be used by
>>> all drivers?
>>
>> I guess you mean all drivers that implement similar mechanism should use
>> this? Then yes.
>> (I don't think all drivers needs bulk tx mechanism though)
>>
>>> (more below)
>>>    
>>>> Signed-off-by: Toshiaki Makita <toshiaki.makita1@...il.com>
>>>> ---
>>>>    include/trace/events/xdp.h | 25 +++++++++++++++++++++++++
>>>>    kernel/bpf/core.c          |  1 +
>>>>    2 files changed, 26 insertions(+)
>>>>
>>>> diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
>>>> index e95cb86..e06ea65 100644
>>>> --- a/include/trace/events/xdp.h
>>>> +++ b/include/trace/events/xdp.h
>>>> @@ -50,6 +50,31 @@
>>>>    		  __entry->ifindex)
>>>>    );
>>>>    
>>>> +TRACE_EVENT(xdp_bulk_tx,
>>>> +
>>>> +	TP_PROTO(const struct net_device *dev,
>>>> +		 int sent, int drops, int err),
>>>> +
>>>> +	TP_ARGS(dev, sent, drops, err),
>>>> +
>>>> +	TP_STRUCT__entry(
>>>
>>> All other tracepoints in this file starts with:
>>>
>>> 		__field(int, prog_id)
>>> 		__field(u32, act)
>>> or
>>> 		__field(int, map_id)
>>> 		__field(u32, act)
>>>
>>> Could you please add those?
>>
>> So... prog_id is the problem. The program can be changed while we are
>> enqueueing packets to the bulk queue, so the prog_id at flush may be an
>> unexpected one.
> 
> Hmmm... that sounds problematic, if the XDP bpf_prog for veth can
> change underneath, before the flush.  Our redirect system, depend on
> things being stable until the xdp_do_flush_map() operation, as will
> e.g. set per-CPU (bpf_redirect_info) map_to_flush pointer (which depend
> on XDP prog), and expect it to be correct/valid.

Sorry, I don't get how maps depend on programs.
At least xdp_do_redirect_map() handles map_to_flush change during NAPI. 
Is there a problem when the map is not changed but the program is changed?
Also I believe this is not veth-specific behavior. Looking at tun and 
i40e, they seem to change xdp_prog without stopping data path.

>> It can be fixed by disabling NAPI when changing XDP programs. This stops
>> packet processing while changing XDP programs, but I guess it is an
>> acceptable compromise. Having said that, I'm honestly not so eager to
>> make this change, since this will require refurbishment of one of the
>> most delicate part of veth XDP, NAPI disabling/enabling mechanism.
>>
>> WDYT?
> 
> Sound like a bug, if XDP bpf_prog is not stable within the NAPI poll...
> 
>   
>>>> +		__field(int, ifindex)
>>>> +		__field(int, drops)
>>>> +		__field(int, sent)
>>>> +		__field(int, err)
>>>> +	),
>>>
>>> The reason is that this make is easier to attach to multiple
>>> tracepoints, and extract the same value.
>>>
>>> Example with bpftrace oneliner:
>>>
>>> $ sudo bpftrace -e 'tracepoint:xdp:xdp_* { @action[args->act] = count(); }'
>