netdev - Re: Bug report: UDP ~20% degradation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f90f500-ed4c-0650-0044-1cd1e3a632c3@gmail.com>
Date:   Wed, 22 Feb 2023 10:49:40 +0200
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>,
        Tariq Toukan <tariqt@...dia.com>
Cc:     David Chen <david.chen@...anix.com>,
        Zhang Qiao <zhangqiao22@...wei.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Network Development <netdev@...r.kernel.org>,
        Gal Pressman <gal@...dia.com>, Malek Imam <mimam@...dia.com>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Talat Batheesh <talatb@...dia.com>
Subject: Re: Bug report: UDP ~20% degradation



On 12/02/2023 13:50, Tariq Toukan wrote:
> 
> 
> On 08/02/2023 16:12, Vincent Guittot wrote:
>> Hi Tariq,
>>
>> On Wed, 8 Feb 2023 at 12:09, Tariq Toukan <tariqt@...dia.com> wrote:
>>>
>>> Hi all,
>>>
>>> Our performance verification team spotted a degradation of up to ~20% in
>>> UDP performance, for a specific combination of parameters.
>>>
>>> Our matrix covers several parameters values, like:
>>> IP version: 4/6
>>> MTU: 1500/9000
>>> Msg size: 64/1452/8952 (only when applicable while avoiding ip
>>> fragmentation).
>>> Num of streams: 1/8/16/24.
>>> Num of directions: unidir/bidir.
>>>
>>> Surprisingly, the issue exists only with this specific combination:
>>> 8 streams,
>>> MTU 9000,
>>> Msg size 8952,
>>> both ipv4/6,
>>> bidir.
>>> (in unidir it repros only with ipv4)
>>>
>>> The reproduction is consistent on all the different setups we tested 
>>> with.
>>>
>>> Bisect [2] was done between these two points, v5.19 (Good), and v6.0-rc1
>>> (Bad), with ConnectX-6DX NIC.
>>>
>>> c82a69629c53eda5233f13fc11c3c01585ef48a2 is the first bad commit [1].
>>>
>>> We couldn't come up with a good explanation how this patch causes this
>>> issue. We also looked for related changes in the networking/UDP stack,
>>> but nothing looked suspicious.
>>>
>>> Maybe someone here can help with this.
>>> We can provide more details or do further tests/experiments to progress
>>> with the debug.
>>
>> Could you share more details about your system and the cpu topology ?
>>
> 
> output for 'lscpu':
> 
> Architecture:                    x86_64
> CPU op-mode(s):                  32-bit, 64-bit
> Address sizes:                   40 bits physical, 57 bits virtual
> Byte Order:                      Little Endian
> CPU(s):                          24
> On-line CPU(s) list:             0-23
> Vendor ID:                       GenuineIntel
> BIOS Vendor ID:                  QEMU
> Model name:                      Intel(R) Xeon(R) Platinum 8380 CPU @ 
> 2.30GHz
> BIOS Model name:                 pc-q35-5.0
> CPU family:                      6
> Model:                           106
> Thread(s) per core:              1
> Core(s) per socket:              1
> Socket(s):                       24
> Stepping:                        6
> BogoMIPS:                        4589.21
> Flags:                           fpu vme de pse tsc msr pae mce cx8 apic 
> sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx 
> pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology 
> cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand 
> hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd 
> ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid 
> ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f 
> avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni 
> avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat 
> avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni 
> avx512_bitalg avx512_vpopcntdq rdpid md_clear arch_capabilities
> Virtualization:                  VT-x
> Hypervisor vendor:               KVM
> Virtualization type:             full
> L1d cache:                       768 KiB (24 instances)
> L1i cache:                       768 KiB (24 instances)
> L2 cache:                        96 MiB (24 instances)
> L3 cache:                        384 MiB (24 instances)
> NUMA node(s):                    1
> NUMA node0 CPU(s):               0-23
> Vulnerability Itlb multihit:     Not affected
> Vulnerability L1tf:              Not affected
> Vulnerability Mds:               Not affected
> Vulnerability Meltdown:          Not affected
> Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers 
> attempted, no microcode; SMT Host state unknown
> Vulnerability Retbleed:          Not affected
> Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass 
> disabled via prctl
> Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers 
> and __user pointer sanitization
> Vulnerability Spectre v2:        Vulnerable: eIBRS with unprivileged eBPF
> Vulnerability Srbds:             Not affected
> Vulnerability Tsx async abort:   Not affected
> 
>> The commit  c82a69629c53 migrates a task on an idle cpu when the task
>> is the only one running on local cpu but the time spent by this local
>> cpu under interrupt or RT context becomes significant (10%-17%)
>> I can imagine that 16/24 stream overload your system so load_balance
>> doesn't end up in this case and the cpus are busy with several
>> threads. On the other hand, 1 stream is small enough to keep your
>> system lightly loaded but 8 streams make your system significantly
>> loaded to trigger the reduced capacity case but still not overloaded.
>>
> 
> I see. Makes sense.
> 1. How do you check this theory? Any suggested tests/experiments?
> 2. How do you suggest this degradation should be fixed?
> 

Hi,
A kind reminder.