[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <012d8975-13a4-4056-a6bf-f9140878cbdb@intel.com>
Date: Fri, 6 Dec 2024 16:06:48 +0100
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Daniel Xu <dxu@...uu.xyz>
CC: Jakub Kicinski <kuba@...nel.org>, Lorenzo Bianconi
<lorenzo.bianconi@...hat.com>, Lorenzo Bianconi <lorenzo@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>, Alexei Starovoitov
<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko
<andrii@...nel.org>, John Fastabend <john.fastabend@...il.com>, "Jesper
Dangaard Brouer" <hawk@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
David Miller <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
"Paolo Abeni" <pabeni@...hat.com>, <netdev@...r.kernel.org>
Subject: Re: [RFC/RFT v2 0/3] Introduce GRO support to cpumap codebase
From: Daniel Xu <dxu@...uu.xyz>
Date: Thu, 5 Dec 2024 17:41:27 -0700
> On Thu, Dec 05, 2024 at 12:06:29PM GMT, Alexander Lobakin wrote:
>> From: Alexander Lobakin <aleksander.lobakin@...el.com>
>> Date: Thu, 5 Dec 2024 11:38:11 +0100
>>
>>> From: Daniel Xu <dxu@...uu.xyz>
>>> Date: Wed, 04 Dec 2024 13:51:08 -0800
>>>
>>>>
>>>>
>>>> On Wed, Dec 4, 2024, at 8:42 AM, Alexander Lobakin wrote:
>>>>> From: Jakub Kicinski <kuba@...nel.org>
>>>>> Date: Tue, 3 Dec 2024 16:51:57 -0800
>>>>>
>>>>>> On Tue, 3 Dec 2024 12:01:16 +0100 Alexander Lobakin wrote:
>>>>>>>>> @ Jakub,
>>>>>>>>
>>>>>>>> Context? What doesn't work and why?
>>>>>>>
>>>>>>> My tests show the same perf as on Lorenzo's series, but I test with UDP
>>>>>>> trafficgen. Daniel tests TCP and the results are much worse than with
>>>>>>> Lorenzo's implementation.
>>>>>>> I suspect this is related to that how NAPI performs flushes / decides
>>>>>>> whether to repoll again or exit vs how kthread does that (even though I
>>>>>>> also try to flush only every 64 frames or when the ring is empty). Or
>>>>>>> maybe to that part of the kthread happens in process context outside any
>>>>>>> softirq, while when using NAPI, the whole loop is inside RX softirq.
>>>>>>>
>>>>>>> Jesper said that he'd like to see cpumap still using own kthread, so
>>>>>>> that its priority can be boosted separately from the backlog. That's why
>>>>>>> we asked you whether it would be fine to have cpumap as threaded NAPI in
>>>>>>> regards to all this :D
>>>>>>
>>>>>> Certainly not without a clear understanding what the problem with
>>>>>> a kthread is.
>>>>>
>>>>> Yes, sure thing.
>>>>>
>>>>> Bad thing's that I can't reproduce Daniel's problem >_< Previously, I
>>>>> was testing with the UDP trafficgen and got up to 80% improvement over
>>>>> the baseline. Now I tested TCP and got up to 70% improvement, no
>>>>> regressions whatsoever =\
>>>>>
>>>>> I don't know where this regression on Daniel's setup comes from. Is it
>>>>> multi-thread or single-thread test?
>>>>
>>>> 8 threads with 16 flows over them (-T8 -F16)
>>>>
>>>>> What app do you use: iperf, netperf,
>>>>> neper, Microsoft's app (forgot the name)?
>>>>
>>>> neper, tcp_stream.
>>>
>>> Let me recheck with neper -T8 -F16, I'll post my results soon.
>>
>> kernel direct T1 direct T8F16 cpumap cpumap T8F16
>> clean 28 51 13 9 Gbps
>> GRO 28 51 26 18 Gbps
>>
>> 100% gain, no regressions =\
>>
>> My XDP prog is simple (upstream xdp-tools repo with no changes):
>>
>> numactl -N 0 xdp-tools/xdp-bench/xdp-bench redirect-cpu -c 23 -s -p
>> no-touch ens802f0np0
>>
>> IOW it simply redirects everything to CPU 23 (same NUMA node) from any
>> Rx queue without looking into headers or packet.
>> Do you test with more sophisticated XDP prog?
>
> Great reminder... my prog is a bit more sophisticated. I forgot we were
> doing latency tracking by inserting a timestamp into frame metadata. But
> not clearing it after it was read on remote CPU, which disables GRO. So
> previous test was paying the penalty of fixed GRO overhead without
> getting any packet merges.
>
> Once I fixed up prog to reset metadata pointer I could see the wins.
> Went from 21621.126 Mbps -> 25546.47 Mbps for a ~18% win in tput. No
> latency changes.
>
> Sorry about the churn.
No problem, crap happens sometimes :)
Let me send my implementation on Monday-Wednesday. I'll include my UDP
and TCP test results, as well as yours (+18%).
BTW would be great if you could give me a Tested-by tag, as I assume the
tests were fine and it works for you?
Thanks,
Olek
Powered by blists - more mailing lists