netdev - Re: [RFC PATCHv2 bpf-next 1/2] xdp: add a new helper for dev map multicast support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <878si2h3sb.fsf@toke.dk>
Date:   Fri, 08 May 2020 16:58:28 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Hangbin Liu <liuhangbin@...il.com>
Cc:     bpf@...r.kernel.org, netdev@...r.kernel.org,
        Jiri Benc <jbenc@...hat.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Eelco Chaudron <echaudro@...hat.com>, ast@...nel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>
Subject: Re: [RFC PATCHv2 bpf-next 1/2] xdp: add a new helper for dev map multicast support

Hangbin Liu <liuhangbin@...il.com> writes:

> On Wed, May 06, 2020 at 12:00:08PM +0200, Toke HÃƒÂ¸iland-JÃƒÂ¸rgensen wrote:
>> > No, I haven't test the performance. Do you have any suggestions about how
>> > to test it? I'd like to try forwarding pkts to 10+ ports. But I don't know
>> > how to test the throughput. I don't think netperf or iperf supports
>> > this.
>> 
>> What I usually do when benchmarking XDP_REDIRECT is to just use pktgen
>> (samples/pktgen in the kernel source tree) on another machine,
>> specifically, like this:
>> 
>> ./pktgen_sample03_burst_single_flow.sh  -i enp1s0f1 -d 10.70.2.2 -m ec:0d:9a:db:11:35 -t 4  -s 64
>> 
>> (adjust iface, IP and MAC address to your system, of course). That'll
>> flood the target machine with small UDP packets. On that machine, I then
>> run the 'xdp_redirect_map' program from samples/bpf. The bpf program
>> used by that sample will update an internal counter for every packet,
>> and the userspace prints it out, which gives you the performance (in
>> PPS). So just modifying that sample to using your new multicast helper
>> (and comparing it to regular REDIRECT to a single device) would be a
>> first approximation of a performance test.
>
> Thanks for this method. I will update the sample and do some more tests.

Great!

>> You could do something like:
>> 
>> bool first = true;
>> for (;;) {
>> 
>> [...]
>> 
>>            if (!first) {
>>    		nxdpf = xdpf_clone(xdpf);
>>    		if (unlikely(!nxdpf))
>>    			return -ENOMEM;
>>    		bq_enqueue(dev, nxdpf, dev_rx);
>>            } else {
>>    		bq_enqueue(dev, xdpf, dev_rx);
>>    		first = false;
>>            }
>> }
>> 
>> /* didn't find anywhere to forward to, free buf */
>> if (first)
>>    xdp_return_frame_rx_napi(xdpf);
>
> I think the first xdpf will be consumed by the driver and the later
> xdpf_clone() will failed, won't it?

No, bq_enqueue just sticks the frame on a list, it's not consumed until
after the NAPI cycle ends (and the driver calls xdp_do_flush()).

> How about just do a xdp_return_frame_rx_napi(xdpf) after all nxdpf enqueue?

Yeah, that would be the semantically obvious thing to do, but it is
wasteful in that you end up performing one more clone than you strictly
have to :)

>> > @@ -3534,6 +3539,8 @@ int xdp_do_redirect(struct net_device *dev, struct
>> > xdp_buff *xdp,
>> >                   struct bpf_prog *xdp_prog)
>> >  {
>> >       struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>> > +     bool exclude_ingress = !!(ri->flags & BPF_F_EXCLUDE_INGRESS);
>> > +     struct bpf_map *ex_map = READ_ONCE(ri->ex_map);
>>
>> I don't think you need the READ_ONCE here since there's already one
>> below?
>
> BTW, I forgot to ask, why we don't need the READ_ONCE for ex_map?
> I though the map and ex_map are two different pointers.

It isn't, but not for the reason I thought, so I can understand why my
comment might have been somewhat confusing (I have been confused by this
myself until just now...).

The READ_ONCE() is not needed because the ex_map field is only ever read
from or written to by the CPU owning the per-cpu pointer. Whereas the
'map' field is manipulated by remote CPUs in bpf_clear_redirect_map().
So you need neither READ_ONCE() nor WRITE_ONCE() on ex_map, just like
there are none on tgt_index and tgt_value.

-Toke