[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50181937-19ea-ccca-057c-eb6931f4b2da@nvidia.com>
Date: Sun, 2 Jul 2023 17:46:42 +0300
From: Gal Pressman <gal@...dia.com>
To: Richard Gobert <richardbgobert@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, aleksander.lobakin@...el.com, lixiaoyan@...gle.com,
lucien.xin@...il.com, alexanderduyck@...com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] gro: decrease size of CB
On 02/07/2023 17:41, Gal Pressman wrote:
> On 30/06/2023 18:39, Richard Gobert wrote:
>> I haven't been able to reproduce it yet, I tried two different setups:
>> - 2 VMs running locally on my PC, and a geneve interface for each. Over
>> these geneve interfaces, I sent tcp traffic with a similar iperf
>> command as yours.
>> - A geneve tunnel over veth peers inside two separate namespaces as
>> David suggested.
>>
>> The throughput looked fine and identical with and without my patch in both
>> setups.
>>
>> Although I did validate it while working on the patch, a problem may arise
>> from:
>> - Packing CB members into a union, which could've led to some sort of
>> corruption.
>> - Calling `gro_pull_from_frag0` on the current skb before inserting it
>> into `gro_list`.
>>
>> Could I ask you to run some tests:
>> - Running the script I attached here on one machine and checking whether
>> it reproduces the problem.
>> - Reverting part of my commit:
>> - Reverting the change to CB struct while keeping the changes to
>> `gro_pull_from_frag0`.
>> - Checking whether the regression remains.
>>
>> Also, could you give me some more details:
>> - The VMs' NIC and driver. Are you using Qemu?
>> - iperf results.
>> - The exact kernel versions (commit hashes) you are using.
>> - Did you run the commands (sysctl/ethtool) on the receiving VM?
>>
>>
>> Here are the commands I used for the namespaces test's setup:
>> ```
>> ip netns add ns1
>>
>> ip link add veth0 type veth peer name veth1
>> ip link set veth1 netns ns1
>>
>> ip a add 192.168.1.1/32 dev veth0
>> ip link set veth0 up
>> ip r add 192.168.1.0/24 dev veth0
>>
>> ip netns exec ns1 ip a add 192.168.1.2/32 dev veth1
>> ip netns exec ns1 ip link set veth1 up
>> ip netns exec ns1 ip r add 192.168.1.0/24 dev veth1
>>
>> ip link add name gnv0 type geneve id 1000 remote 192.168.1.2
>> ip a add 10.0.0.1/32 dev gnv0
>> ip link set gnv0 up
>> ip r add 10.0.1.1/32 dev gnv0
>>
>> ip netns exec ns1 ip link add name gnv0 type geneve id 1000 remote 192.168.1.1
>> ip netns exec ns1 ip a add 10.0.1.1/32 dev gnv0
>> ip netns exec ns1 ip link set gnv0 up
>> ip netns exec ns1 ip r add 10.0.0.1/32 dev gnv0
>>
>> ethtool -K veth0 generic-receive-offload off
>> ip netns exec ns1 ethtool -K veth1 generic-receive-offload off
>>
>> # quick way to enable gro on veth devices
>> ethtool -K veth0 tcp-segmentation-offload off
>> ip netns exec ns1 ethtool -K veth1 tcp-segmentation-offload off
>> ```
>>
>> I'll continue looking into it on Monday. It would be great if someone from
>> your team can write a test that reproduces this issue.
>>
>> Thanks.
>
> Hey,
>
> I don't have an answer for all of your questions yet, but it turns out I
> left out an important detail, the issue reproduces when outer ipv6 is used.
>
> I'm using ConnectX-6 Dx, with these scripts:
>
> Server:
> ip addr add 194.236.5.246/16 dev eth2
> ip addr add ::12:236:5:246/96 dev eth2
> ip link set dev eth2 up
>
> ip link add p1_g464 type geneve id 464 remote ::12:236:4:245
> ip link set dev p1_g464 up
> ip addr add 196.236.5.1/16 dev p1_g464
>
> Client:
> ip addr add 194.236.4.245/16 dev eth2
> ip addr add ::12:236:4:245/96 dev eth2
> ip link set dev eth2 up
>
> ip link add p0_g464 type geneve id 464 remote ::12:236:5:246
> ip link set dev p0_g464 up
> ip addr add 196.236.4.2/16 dev p0_g464
>
> Once everything is set up, iperf -s on the server and
> iperf -c 196.236.5.1 -i1 -t1000
> On the client, should do the work.
>
> Unfortunately, I haven't been able to reproduce the same issue with veth
> interfaces.
>
> Reverting the napi_gro_cb part indeed resolves the issue.
>
> Thanks for taking a look!
BTW, all testing is done after checking out to your commit:
7b355b76e2b3 ("gro: decrease size of CB")
Powered by blists - more mailing lists