lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2ce87e4c-43f4-4931-846b-10ddff4a64dc@tu-dortmund.de>
Date: Wed, 10 Sep 2025 22:22:48 +0200
From: Simon Schippers <simon.schippers@...dortmund.de>
To: Jason Wang <jasowang@...hat.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: mst@...hat.com, eperezma@...hat.com, stephen@...workplumber.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux.dev, kvm@...r.kernel.org,
        Tim Gebauer <tim.gebauer@...dortmund.de>
Subject: [PATCH 4/4] netdev queue flow control for vhost_net

Jason Wang wrote:
> On Wed, Sep 3, 2025 at 9:52 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
>>
>> Jason Wang wrote:
>>> On Wed, Sep 3, 2025 at 5:31 AM Willem de Bruijn
>>> <willemdebruijn.kernel@...il.com> wrote:
>>>>
>>>> Simon Schippers wrote:
>>>>> Stopping the queue is done in tun_net_xmit.
>>>>>
>>>>> Waking the queue is done by calling one of the helpers,
>>>>> tun_wake_netdev_queue and tap_wake_netdev_queue. For that, in
>>>>> get_wake_netdev_queue, the correct method is determined and saved in the
>>>>> function pointer wake_netdev_queue of the vhost_net_virtqueue. Then, each
>>>>> time after consuming a batch in vhost_net_buf_produce, wake_netdev_queue
>>>>> is called.
>>>>>
>>>>> Co-developed-by: Tim Gebauer <tim.gebauer@...dortmund.de>
>>>>> Signed-off-by: Tim Gebauer <tim.gebauer@...dortmund.de>
>>>>> Signed-off-by: Simon Schippers <simon.schippers@...dortmund.de>
>>>>> ---
>>>>>  drivers/net/tap.c      |  6 ++++++
>>>>>  drivers/net/tun.c      |  6 ++++++
>>>>>  drivers/vhost/net.c    | 34 ++++++++++++++++++++++++++++------
>>>>>  include/linux/if_tap.h |  2 ++
>>>>>  include/linux/if_tun.h |  3 +++
>>>>>  5 files changed, 45 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
>>>>> index 4d874672bcd7..0bad9e3d59af 100644
>>>>> --- a/drivers/net/tap.c
>>>>> +++ b/drivers/net/tap.c
>>>>> @@ -1198,6 +1198,12 @@ struct socket *tap_get_socket(struct file *file)
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(tap_get_socket);
>>>>>
>>>>> +void tap_wake_netdev_queue(struct file *file)
>>>>> +{
>>>>> +     wake_netdev_queue(file->private_data);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(tap_wake_netdev_queue);
>>>>> +
>>>>>  struct ptr_ring *tap_get_ptr_ring(struct file *file)
>>>>>  {
>>>>>       struct tap_queue *q;
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index 735498e221d8..e85589b596ac 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -3739,6 +3739,12 @@ struct socket *tun_get_socket(struct file *file)
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(tun_get_socket);
>>>>>
>>>>> +void tun_wake_netdev_queue(struct file *file)
>>>>> +{
>>>>> +     wake_netdev_queue(file->private_data);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(tun_wake_netdev_queue);
>>>>
>>>> Having multiple functions with the same name is tad annoying from a
>>>> cscape PoV, better to call the internal functions
>>>> __tun_wake_netdev_queue, etc.
>>>>
>>>>> +
>>>>>  struct ptr_ring *tun_get_tx_ring(struct file *file)
>>>>>  {
>>>>>       struct tun_file *tfile;
>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>>> index 6edac0c1ba9b..e837d3a334f1 100644
>>>>> --- a/drivers/vhost/net.c
>>>>> +++ b/drivers/vhost/net.c
>>>>> @@ -130,6 +130,7 @@ struct vhost_net_virtqueue {
>>>>>       struct vhost_net_buf rxq;
>>>>>       /* Batched XDP buffs */
>>>>>       struct xdp_buff *xdp;
>>>>> +     void (*wake_netdev_queue)(struct file *f);
>>>>
>>>> Indirect function calls are expensive post spectre. Probably
>>>> preferable to just have a branch.
>>>>
>>>> A branch in `file->f_op != &tun_fops` would be expensive still as it
>>>> may touch a cold cacheline.
>>>>
>>>> How about adding a bit in struct ptr_ring itself. Pahole shows plenty
>>>> of holes. Jason, WDYT?
>>>>
>>>
>>> I'm not sure I get the idea, did you mean a bit for classifying TUN
>>> and TAP? If this is, I'm not sure it's a good idea as ptr_ring should
>>> have no knowledge of its user.
>>
>> That is what I meant.
>>
>>> Consider there were still indirect calls to sock->ops, maybe we can
>>> start from the branch.
>>
>> What do you mean?
>>
>> Tangential: if indirect calls really are needed in a hot path, e.g.,
>> to maintain this isolation of ptr_ring from its users, then
>> INDIRECT_CALL wrappers can be used to avoid the cost.
>>
>> That too effectively breaks the isolation between caller and callee.
>> But only for the most important N callers that are listed in the
>> INDIRECT_CALL_? wrapper.
> 
> Yes, I mean we can try to store the flag for example vhost_virtqueue struct.
> 
> Thanks
> 

I would just save the flag in the vhost_virtqueue struct as:

enum if_type {IF_NONE = 0, TUN, TAP} type;

Is this how you would implement it?


Apart from that, I found that vhost_net would eventually not wake anymore.
This results in a dead interface. I rewrote my implementation to tackle
this issue apart from the other requested changes. I will test it, run
pktgen benchmarks, and then post a v5, but this will probably take me more
time.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ