netdev - Re: [PATCH v3 1/1] gro: decrease size of CB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230630153923.GA18237@debian>
Date: Fri, 30 Jun 2023 17:39:25 +0200
From: Richard Gobert <richardbgobert@...il.com>
To: Gal Pressman <gal@...dia.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
	pabeni@...hat.com, aleksander.lobakin@...el.com,
	lixiaoyan@...gle.com, lucien.xin@...il.com, alexanderduyck@...com,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] gro: decrease size of CB

I haven't been able to reproduce it yet, I tried two different setups:
    - 2 VMs running locally on my PC, and a geneve interface for each. Over
      these geneve interfaces, I sent tcp traffic with a similar iperf
      command as yours.
    - A geneve tunnel over veth peers inside two separate namespaces as
      David suggested.

The throughput looked fine and identical with and without my patch in both
setups.

Although I did validate it while working on the patch, a problem may arise
from:
    - Packing CB members into a union, which could've led to some sort of
      corruption.
    - Calling `gro_pull_from_frag0` on the current skb before inserting it
      into `gro_list`.

Could I ask you to run some tests:
    - Running the script I attached here on one machine and checking whether
      it reproduces the problem. 
    - Reverting part of my commit: 
        - Reverting the change to CB struct while keeping the changes to
          `gro_pull_from_frag0`.
        - Checking whether the regression remains.

Also, could you give me some more details:
    - The VMs' NIC and driver. Are you using Qemu? 
    - iperf results.
    - The exact kernel versions (commit hashes) you are using.
    - Did you run the commands (sysctl/ethtool) on the receiving VM?


Here are the commands I used for the namespaces test's setup:
```
ip netns add ns1

ip link add veth0 type veth peer name veth1
ip link set veth1 netns ns1

ip a add 192.168.1.1/32 dev veth0
ip link set veth0 up
ip r add 192.168.1.0/24 dev veth0

ip netns exec ns1 ip a add 192.168.1.2/32 dev veth1
ip netns exec ns1 ip link set veth1 up
ip netns exec ns1 ip r add 192.168.1.0/24 dev veth1

ip link add name gnv0 type geneve id 1000 remote 192.168.1.2
ip a add 10.0.0.1/32 dev gnv0
ip link set gnv0 up
ip r add 10.0.1.1/32 dev gnv0

ip netns exec ns1 ip link add name gnv0 type geneve id 1000 remote 192.168.1.1
ip netns exec ns1 ip a add 10.0.1.1/32 dev gnv0
ip netns exec ns1 ip link set gnv0 up
ip netns exec ns1 ip r add 10.0.0.1/32 dev gnv0

ethtool -K veth0 generic-receive-offload off
ip netns exec ns1 ethtool -K veth1 generic-receive-offload off

# quick way to enable gro on veth devices
ethtool -K veth0 tcp-segmentation-offload off
ip netns exec ns1 ethtool -K veth1 tcp-segmentation-offload off
```

I'll continue looking into it on Monday. It would be great if someone from
your team can write a test that reproduces this issue.

Thanks.