lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <38d019dd-b876-4fc1-ba7e-f1eb85ad7360@nvidia.com>
Date: Sun, 12 Jan 2025 08:05:17 +0000
From: Alex Lazar <alazar@...dia.com>
To: Martin Karsten <mkarsten@...terloo.ca>, Stanislav Fomichev
	<stfomichev@...il.com>, Joe Damato <jdamato@...tly.com>,
	"aleksander.lobakin@...el.com" <aleksander.lobakin@...el.com>,
	"almasrymina@...gle.com" <almasrymina@...gle.com>, "bigeasy@...utronix.de"
	<bigeasy@...utronix.de>, "bjorn@...osinc.com" <bjorn@...osinc.com>, Dan
 Jurgens <danielj@...dia.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"donald.hunter@...il.com" <donald.hunter@...il.com>, "dsahern@...nel.org"
	<dsahern@...nel.org>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"hawk@...nel.org" <hawk@...nel.org>, "jiri@...nulli.us" <jiri@...nulli.us>,
	"johannes.berg@...el.com" <johannes.berg@...el.com>, "kuba@...nel.org"
	<kuba@...nel.org>, "leitao@...ian.org" <leitao@...ian.org>, "leon@...nel.org"
	<leon@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, "lorenzo@...nel.org" <lorenzo@...nel.org>,
	"michael.chan@...adcom.com" <michael.chan@...adcom.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "pabeni@...hat.com"
	<pabeni@...hat.com>, Saeed Mahameed <saeedm@...dia.com>, "sdf@...ichev.me"
	<sdf@...ichev.me>, "skhawaja@...gle.com" <skhawaja@...gle.com>,
	"sridhar.samudrala@...el.com" <sridhar.samudrala@...el.com>, Tariq Toukan
	<tariqt@...dia.com>, "willemdebruijn.kernel@...il.com"
	<willemdebruijn.kernel@...il.com>, "xuanzhuo@...ux.alibaba.com"
	<xuanzhuo@...ux.alibaba.com>, Gal Pressman <gal@...dia.com>, Nimrod Oren
	<noren@...dia.com>, Dror Tennenbaum <drort@...dia.com>, Dragos Tatulea
	<dtatulea@...dia.com>
Subject: Re: [net-next v6 0/9] Add support for per-NAPI config via netlink



On 10/01/2025 20:58, Martin Karsten wrote:
> On 2025-01-10 13:26, Stanislav Fomichev wrote:
>> On 01/10, Joe Damato wrote:
>>> On Mon, Dec 30, 2024 at 09:31:23AM -0500, Joe Damato wrote:
>>>> On Mon, Dec 23, 2024 at 08:17:08AM +0000, Alex Lazar wrote:
>>>>>
>>>
>>> [...]
>>>
>>>>>
>>>>> Hi Joe,
>>>>>
>>>>> Thanks for the quick response.
>>>>> Comments inline, If you need more details or further clarification,
>>>>> please let me know.
>>>>
>>>> As mentioned above and in my previous emails: please provide lot
>>>> more detail and make it as easy as possible for me to reproduce this
>>>> issue with the simplest reproducer possible and a much more detailed
>>>> explanation.
>>>>
>>>> Please note: I will be out of the office until Jan 9 so my responses
>>>> will be limited until then.
>>>
>>> Just to follow up on this for anyone who missed the other thread,
>>> Stanislav proposed a patch which _might_ fix the issue being hit
>>> here.
>>>
>>> Please see [1], try that patch, and report back if that patch fixes
>>> the issue.
>>>
>>> Thanks.
>>>
>>> [1]: https://lore.kernel.org/netdev/20250109003436.2829560-1- 
>>> sdf@...ichev.me/
>>
>> Note that it might help only if xsk is using busy-polling. Not sure
>> that's the case, it's relatively obscure feature :-)
> 
> I believe I have reproduced Alex' issue using the methodology below and 
> your patch fixes it for me.
> 
> The experiment uses a server (tilly01) with mlx5 and a client (tilly02). 
> In the problem case, the 'response' packet gets stuck, but the next 
> 'request' packets triggers both the stuck and the regular responses. The 
> pattern can also be seen in the tcpdump output at the client. Note that 
> the response packet is not a valid packet (only MAC addresses swapped, 
> not IP addresses), but tcpdump shows it regardless.
> 
> Thanks,
> Martin
> 
> # on server tilly01
> watch -n 0.5 "sudo ethtool -S ens2f1np1 | fgrep tx_xsk_xmit"
> 
> # on client tilly02
> sudo tcpdump -qbi eno3d1 udp
> 
> # on client tilly02
> while true; do
>    ssh tilly01 "sudo ifconfig ens2f1np1 down; sudo modprobe -r mlx5_ib;
>      sleep 1; sudo modprobe mlx5_ib; sudo ifconfig ens2f1np1 up"
>    ssh -f tilly01 "sudo ./bpf-examples/AF_XDP-example/xdpsock \
>      -i ens2f1np1 -N -q 4 --l2fwd -z -B >/dev/null 2>&1"
>    exp=1
>    for ((i=0;i<5;i++)); do
>      ssh tilly01 "sudo ethtool --config-ntuple ens2f1np1 flow-type udp4\
>        dst-port 19017 action 4 >/dev/null 2>&1"
>      for ((j=0;j<10;j++)); do
>        echo -n "$exp "
>        echo 'send(IP(dst="192.168.199.1",src="192.168.199.2")\
>          /UDP(dport=19017))' | sudo ./scapy/run_scapy >/dev/null 2>&1
>        cnt=$(ssh tilly01 ethtool -S ens2f1np1|grep -F tx_xsk_xmit\
>          |cut -f2 -d:)
>        [ $cnt -eq $exp ] || {
>          echo COUNTER WRONG
>          read x
>        }
>        ((exp+=1))
>      done
>      ssh tilly01 sudo ethtool --config-ntuple ens2f1np1 delete 1023
>    done
>    echo reset
>    ssh tilly01 sudo killall xdpsock
> done
> 

Thanks to Joe Martin and Stanislav for introducing this fix and for your 
efforts in solving this issue. I reviewed it over the weekend and 
verified that it solves the problem.

Thanks,
Alex Lazar


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ