lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33369e7d-4c43-407a-92fd-373a8a7b2160@daynix.com>
Date: Sat, 15 Feb 2025 14:25:42 +0900
From: Akihiko Odaki <akihiko.odaki@...nix.com>
To: "Michael S. Tsirkin" <mst@...hat.com>
Cc: Jonathan Corbet <corbet@....net>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 Jason Wang <jasowang@...hat.com>, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Shuah Khan <shuah@...nel.org>, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, netdev@...r.kernel.org, kvm@...r.kernel.org,
 virtualization@...ts.linux-foundation.org, linux-kselftest@...r.kernel.org,
 Yuri Benditovich <yuri.benditovich@...nix.com>,
 Andrew Melnychenko <andrew@...nix.com>,
 Stephen Hemminger <stephen@...workplumber.org>, gur.stavi@...wei.com,
 devel@...nix.com
Subject: Re: [PATCH net-next] tun: Pad virtio headers

On 2025/02/14 0:43, Michael S. Tsirkin wrote:
> On Thu, Feb 13, 2025 at 06:23:55PM +0900, Akihiko Odaki wrote:
>> On 2025/02/13 16:18, Michael S. Tsirkin wrote:
>>>
>>> Commit log needs some work.
>>>
>>> So my understanding is, this patch does not do much functionally,
>>> but makes adding the hash feature easier. OK.
>>>
>>> On Thu, Feb 13, 2025 at 03:54:06PM +0900, Akihiko Odaki wrote:
>>>> tun used to simply advance iov_iter when it needs to pad virtio header,
>>>> which leaves the garbage in the buffer as is. This is especially
>>>> problematic
>>>
>>> I think you mean "this will become especially problematic"
>>>
>>>> when tun starts to allow enabling the hash reporting
>>>> feature; even if the feature is enabled, the packet may lack a hash
>>>> value and may contain a hole in the virtio header because the packet
>>>> arrived before the feature gets enabled or does not contain the
>>>> header fields to be hashed. If the hole is not filled with zero, it is
>>>> impossible to tell if the packet lacks a hash value.
>>>>
>>>> In theory, a user of tun can fill the buffer with zero before calling
>>>> read() to avoid such a problem, but leaving the garbage in the buffer is
>>>> awkward anyway so fill the buffer in tun.
>>>
>>>
>>> What is missing here is description of what the patch does.
>>> I think it is
>>> "Replace advancing the iterator with writing zeros".
>>>
>>> There could be performance cost to the dirtying extra cache lines, though.
>>> Could you try checking that please?
>>
>> It will not dirty extra cache lines; an explanation follows later. Because
>> of that, any benchmark are likely to show only noises, but if you have an
>> idea of workloads that should be tested, please tell me.
> 
> pktgen usually

I tried it but it didn't give meaningful results so I may be doing 
wrong. It didn't show an obvious performance regression at least. I ran 
the following procedure:

1. create a veth pair
2. connect it to macvtap
3. run Fedora 41 on QEMU with vhost=on
4. run samples/pktgen/pktgen_sample01_simple.sh for the other end of veth
5. observe the rx packet rate of macvtap with ifstat for 60 seconds

ifstat showed that it received:
532K packets / 60 seconds without this patch
578K packets / 60 seconds with this patch

This is 8.6 % uplift, not degradation. I guess it's just a noise.

Below are actual commands I ran:

The commands I set up the veth pair and macvtap is as follows:

ip link add veth_host type veth peer name veth_guest
ip link set veth_host up
ip link set veth_guest up
ip link add link macvtap0 link veth_guest type macvtap
ip link set macvtap0 address 02:00:00:01:00:00 mtu 1486 up
ip address add 10.0.2.0 dev veth_host
ip route add 10.0.2.1 dev veth_host

The command for the pktgen is:
samples/pktgen/pktgen_sample01_simple.sh -i veth_host -d 10.0.2.1 -m 
02:00:00:01:00:00 -n 0

After I started pktgen, I ran: ifstat -d 60 macvtap0
I waited 60 seconds, and observed the rx rate with: ifstat -as macvtap0

> 
> 
> 
>>>
>>> I think we should mention the risks of the patch, too.
>>> Maybe:
>>>
>>> 	Also in theory, a user might have initialized the buffer
>>> 	to some non-zero value, expecting tun to skip writing it.
>>> 	As this was never a documented feature, this seems unlikely.
>>>>
>>>>
>>>> The specification also says the device MUST set num_buffers to 1 when
>>>> the field is present so set it when the specified header size is big
>>>> enough to contain the field.
>>>
>>> This part I dislike. tun has no idea what the number of buffers is.
>>> Why 1 specifically?
>>
>> That's a valid point. I rewrote the commit log to clarify, but perhaps we
>> can drop the code to set the num_buffers as "[PATCH] vhost/net: Set
>> num_buffers for virtio 1.0" already landed.
> 
> 
> I think I'd prefer that second option. it allows userspace
> to reliably detect the new behaviour, by setting the value
> to != 0.

I'll leave num_buffers zero in the next version.

> 
> 
>>
>> Below is the rewritten commit log, which incorporates your suggestions and
>> is extended to cover the performance implication and reason the num_buffers
>> initialization:
>>
>> tun simply advances iov_iter when it needs to pad virtio header,
>> which leaves the garbage in the buffer as is. This will become
>> especially problematic when tun starts to allow enabling the hash
>> reporting feature; even if the feature is enabled, the packet may lack a
>> hash value and may contain a hole in the virtio header because the
>> packet arrived before the feature gets enabled or does not contain the
>> header fields to be hashed. If the hole is not filled with zero, it is
>> impossible to tell if the packet lacks a hash value.
>>
>> In theory, a user of tun can fill the buffer with zero before calling
>> read() to avoid such a problem, but leaving the garbage in the buffer is
>> awkward anyway so replace advancing the iterator with writing zeros.
>>
>> A user might have initialized the buffer to some non-zero value,
>> expecting tun to skip writing it. As this was never a documented
>> feature, this seems unlikely. Neither is there a non-zero value that can
>> be determined and set before receiving the packet; the only exception
>> is the num_buffers field, which is expected to be 1 for version 1 when
>> VIRTIO_NET_F_HASH_REPORT is not negotiated.
> 
> you need mergeable buffers instead i presume.
> 
>> This field is specifically
>> set to 1 instead of 0.
>>
>> The overhead of filling the hole in the header is negligible as the
>> entire header is already placed on the cache when a header size defined
> 
> 
> what does this mean?

The current specification says the header size is 20 bytes or less. tun 
already makes all cache lines where the header will be written dirty for 
such a header size so we are not making another cache line dirty.

> 
>> in the current specification is used even if the cache line is small
>> (16 bytes for example).
>>
>> Below are the header sizes possible with the current specification:
>> a) 10 bytes if the legacy interface is used
>> b) 12 bytes if the modern interface is used
>> c) 20 bytes if VIRTIO_NET_F_HASH_REPORT is negotiated
>>
>> a) and b) obviously fit in a cache line. c) uses one extra cache line,
>> but the cache line also contains the first 12 bytes of the packet so
>> it is always placed on the cache.
> 
> 
> Hmm. But it could be clean so shared. write makes it dirty and so
> not shared.

The packet is not just located after the header in the buffer but tun 
writes the packet there. It makes the cache line dirty even without this 
change.

> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ