[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2ce87e4c-43f4-4931-846b-10ddff4a64dc@tu-dortmund.de>
Date: Wed, 10 Sep 2025 22:22:48 +0200
From: Simon Schippers <simon.schippers@...dortmund.de>
To: Jason Wang <jasowang@...hat.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: mst@...hat.com, eperezma@...hat.com, stephen@...workplumber.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
virtualization@...ts.linux.dev, kvm@...r.kernel.org,
Tim Gebauer <tim.gebauer@...dortmund.de>
Subject: [PATCH 4/4] netdev queue flow control for vhost_net
Jason Wang wrote:
> On Wed, Sep 3, 2025 at 9:52 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
>>
>> Jason Wang wrote:
>>> On Wed, Sep 3, 2025 at 5:31 AM Willem de Bruijn
>>> <willemdebruijn.kernel@...il.com> wrote:
>>>>
>>>> Simon Schippers wrote:
>>>>> Stopping the queue is done in tun_net_xmit.
>>>>>
>>>>> Waking the queue is done by calling one of the helpers,
>>>>> tun_wake_netdev_queue and tap_wake_netdev_queue. For that, in
>>>>> get_wake_netdev_queue, the correct method is determined and saved in the
>>>>> function pointer wake_netdev_queue of the vhost_net_virtqueue. Then, each
>>>>> time after consuming a batch in vhost_net_buf_produce, wake_netdev_queue
>>>>> is called.
>>>>>
>>>>> Co-developed-by: Tim Gebauer <tim.gebauer@...dortmund.de>
>>>>> Signed-off-by: Tim Gebauer <tim.gebauer@...dortmund.de>
>>>>> Signed-off-by: Simon Schippers <simon.schippers@...dortmund.de>
>>>>> ---
>>>>> drivers/net/tap.c | 6 ++++++
>>>>> drivers/net/tun.c | 6 ++++++
>>>>> drivers/vhost/net.c | 34 ++++++++++++++++++++++++++++------
>>>>> include/linux/if_tap.h | 2 ++
>>>>> include/linux/if_tun.h | 3 +++
>>>>> 5 files changed, 45 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
>>>>> index 4d874672bcd7..0bad9e3d59af 100644
>>>>> --- a/drivers/net/tap.c
>>>>> +++ b/drivers/net/tap.c
>>>>> @@ -1198,6 +1198,12 @@ struct socket *tap_get_socket(struct file *file)
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(tap_get_socket);
>>>>>
>>>>> +void tap_wake_netdev_queue(struct file *file)
>>>>> +{
>>>>> + wake_netdev_queue(file->private_data);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(tap_wake_netdev_queue);
>>>>> +
>>>>> struct ptr_ring *tap_get_ptr_ring(struct file *file)
>>>>> {
>>>>> struct tap_queue *q;
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index 735498e221d8..e85589b596ac 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -3739,6 +3739,12 @@ struct socket *tun_get_socket(struct file *file)
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(tun_get_socket);
>>>>>
>>>>> +void tun_wake_netdev_queue(struct file *file)
>>>>> +{
>>>>> + wake_netdev_queue(file->private_data);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(tun_wake_netdev_queue);
>>>>
>>>> Having multiple functions with the same name is tad annoying from a
>>>> cscape PoV, better to call the internal functions
>>>> __tun_wake_netdev_queue, etc.
>>>>
>>>>> +
>>>>> struct ptr_ring *tun_get_tx_ring(struct file *file)
>>>>> {
>>>>> struct tun_file *tfile;
>>>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>>>> index 6edac0c1ba9b..e837d3a334f1 100644
>>>>> --- a/drivers/vhost/net.c
>>>>> +++ b/drivers/vhost/net.c
>>>>> @@ -130,6 +130,7 @@ struct vhost_net_virtqueue {
>>>>> struct vhost_net_buf rxq;
>>>>> /* Batched XDP buffs */
>>>>> struct xdp_buff *xdp;
>>>>> + void (*wake_netdev_queue)(struct file *f);
>>>>
>>>> Indirect function calls are expensive post spectre. Probably
>>>> preferable to just have a branch.
>>>>
>>>> A branch in `file->f_op != &tun_fops` would be expensive still as it
>>>> may touch a cold cacheline.
>>>>
>>>> How about adding a bit in struct ptr_ring itself. Pahole shows plenty
>>>> of holes. Jason, WDYT?
>>>>
>>>
>>> I'm not sure I get the idea, did you mean a bit for classifying TUN
>>> and TAP? If this is, I'm not sure it's a good idea as ptr_ring should
>>> have no knowledge of its user.
>>
>> That is what I meant.
>>
>>> Consider there were still indirect calls to sock->ops, maybe we can
>>> start from the branch.
>>
>> What do you mean?
>>
>> Tangential: if indirect calls really are needed in a hot path, e.g.,
>> to maintain this isolation of ptr_ring from its users, then
>> INDIRECT_CALL wrappers can be used to avoid the cost.
>>
>> That too effectively breaks the isolation between caller and callee.
>> But only for the most important N callers that are listed in the
>> INDIRECT_CALL_? wrapper.
>
> Yes, I mean we can try to store the flag for example vhost_virtqueue struct.
>
> Thanks
>
I would just save the flag in the vhost_virtqueue struct as:
enum if_type {IF_NONE = 0, TUN, TAP} type;
Is this how you would implement it?
Apart from that, I found that vhost_net would eventually not wake anymore.
This results in a dead interface. I rewrote my implementation to tackle
this issue apart from the other requested changes. I will test it, run
pktgen benchmarks, and then post a v5, but this will probably take me more
time.
Thanks!
Powered by blists - more mailing lists