lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yzda66wro5twmzpmjoxvy4si5zvkehlmgtpi6brheek3sj73tj@o7kd6nurr3o6>
Date: Thu, 5 Dec 2024 17:41:27 -0700
From: Daniel Xu <dxu@...uu.xyz>
To: Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: Jakub Kicinski <kuba@...nel.org>, 
	Lorenzo Bianconi <lorenzo.bianconi@...hat.com>, Lorenzo Bianconi <lorenzo@...nel.org>, 
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	John Fastabend <john.fastabend@...il.com>, Jesper Dangaard Brouer <hawk@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, David Miller <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Subject: Re: [RFC/RFT v2 0/3] Introduce GRO support to cpumap codebase

On Thu, Dec 05, 2024 at 12:06:29PM GMT, Alexander Lobakin wrote:
> From: Alexander Lobakin <aleksander.lobakin@...el.com>
> Date: Thu, 5 Dec 2024 11:38:11 +0100
> 
> > From: Daniel Xu <dxu@...uu.xyz>
> > Date: Wed, 04 Dec 2024 13:51:08 -0800
> > 
> >>
> >>
> >> On Wed, Dec 4, 2024, at 8:42 AM, Alexander Lobakin wrote:
> >>> From: Jakub Kicinski <kuba@...nel.org>
> >>> Date: Tue, 3 Dec 2024 16:51:57 -0800
> >>>
> >>>> On Tue, 3 Dec 2024 12:01:16 +0100 Alexander Lobakin wrote:
> >>>>>>> @ Jakub,  
> >>>>>>
> >>>>>> Context? What doesn't work and why?  
> >>>>>
> >>>>> My tests show the same perf as on Lorenzo's series, but I test with UDP
> >>>>> trafficgen. Daniel tests TCP and the results are much worse than with
> >>>>> Lorenzo's implementation.
> >>>>> I suspect this is related to that how NAPI performs flushes / decides
> >>>>> whether to repoll again or exit vs how kthread does that (even though I
> >>>>> also try to flush only every 64 frames or when the ring is empty). Or
> >>>>> maybe to that part of the kthread happens in process context outside any
> >>>>> softirq, while when using NAPI, the whole loop is inside RX softirq.
> >>>>>
> >>>>> Jesper said that he'd like to see cpumap still using own kthread, so
> >>>>> that its priority can be boosted separately from the backlog. That's why
> >>>>> we asked you whether it would be fine to have cpumap as threaded NAPI in
> >>>>> regards to all this :D
> >>>>
> >>>> Certainly not without a clear understanding what the problem with 
> >>>> a kthread is.
> >>>
> >>> Yes, sure thing.
> >>>
> >>> Bad thing's that I can't reproduce Daniel's problem >_< Previously, I
> >>> was testing with the UDP trafficgen and got up to 80% improvement over
> >>> the baseline. Now I tested TCP and got up to 70% improvement, no
> >>> regressions whatsoever =\
> >>>
> >>> I don't know where this regression on Daniel's setup comes from. Is it
> >>> multi-thread or single-thread test? 
> >>
> >> 8 threads with 16 flows over them (-T8 -F16)
> >>
> >>> What app do you use: iperf, netperf,
> >>> neper, Microsoft's app (forgot the name)?
> >>
> >> neper, tcp_stream.
> > 
> > Let me recheck with neper -T8 -F16, I'll post my results soon.
> 
> kernel     direct T1    direct T8F16    cpumap    cpumap T8F16
> clean      28           51              13        9               Gbps
> GRO        28           51              26        18              Gbps
> 
> 100% gain, no regressions =\
> 
> My XDP prog is simple (upstream xdp-tools repo with no changes):
> 
> numactl -N 0 xdp-tools/xdp-bench/xdp-bench redirect-cpu -c 23 -s -p
> no-touch ens802f0np0
> 
> IOW it simply redirects everything to CPU 23 (same NUMA node) from any
> Rx queue without looking into headers or packet.
> Do you test with more sophisticated XDP prog?

Great reminder... my prog is a bit more sophisticated. I forgot we were
doing latency tracking by inserting a timestamp into frame metadata. But
not clearing it after it was read on remote CPU, which disables GRO. So
previous test was paying the penalty of fixed GRO overhead without
getting any packet merges.

Once I fixed up prog to reset metadata pointer I could see the wins.
Went from 21621.126 Mbps -> 25546.47 Mbps for a ~18% win in tput. No
latency changes.

Sorry about the churn.

Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ