[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ua7aoa6yapzzitbg77taspl7h34mmp32lrn6zmr7m6w6xfwk26@w6hheulzftw6>
Date: Mon, 1 Jul 2024 17:32:22 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Arseniy Krasnov <avkrasnov@...utedevices.com>
Cc: Stefan Hajnoczi <stefanha@...hat.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Bobby Eshleman <bobby.eshleman@...edance.com>, kvm@...r.kernel.org, virtualization@...ts.linux-foundation.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, kernel@...rdevices.ru,
oxffffaa@...il.com
Subject: Re: [RFC PATCH v1 1/2] virtio/vsock: rework deferred credit update
logic
Hi Arseniy,
On Fri, Jun 21, 2024 at 10:25:40PM GMT, Arseniy Krasnov wrote:
>Previous calculation of 'free_space' was wrong (but worked as expected
>in most cases, see below), because it didn't account number of bytes in
>rx queue. Let's rework 'free_space' calculation in the following way:
>as this value is considered free space at rx side from tx point of
>view,
>it must be equal to return value of 'virtio_transport_get_credit()' at
>tx side. This function uses 'tx_cnt' counter and 'peer_fwd_cnt': first
>is number of transmitted bytes (without wrap), second is last 'fwd_cnt'
>value received from rx. So let's use same approach at rx side during
>'free_space' calculation: add 'rx_cnt' counter which is number of
>received bytes (also without wrap) and subtract 'last_fwd_cnt' from it.
>Now we have:
>1) 'rx_cnt' == 'tx_cnt' at both sides.
>2) 'last_fwd_cnt' == 'peer_fwd_cnt' - because first is last 'fwd_cnt'
> sent to tx, while second is last 'fwd_cnt' received from rx.
>
>Now 'free_space' is handled correctly and also we don't need
>'low_rx_bytes' flag - this was more like a hack.
>
>Previous calculation of 'free_space' worked (in 99% cases), because if
>we take a look on behaviour of both expressions (new and previous):
>
>'(rx_cnt - last_fwd_cnt)' and '(fwd_cnt - last_fwd_cnt)'
>
>Both of them always grows up, with almost same "speed": only difference
>is that 'rx_cnt' is incremented earlier during packet is received,
>while 'fwd_cnt' in incremented when packet is read by user. So if
>'rx_cnt'
>grows "faster", then resulting 'free_space' become smaller also, so we
>send credit updates a little bit more, but:
>
> * 'free_space' calculation based on 'rx_cnt' gives the same value,
> which tx sees as free space at rx side, so original idea of
> 'free_space' is now implemented as planned.
> * Hack with 'low_rx_bytes' now is not needed.
>
>Also here is some performance comparison between both versions of
>'free_space' calculation:
>
> *------*----------*----------*
> | | 'rx_cnt' | previous |
> *------*----------*----------*
> |H -> G| 8.42 | 7.82 |
> *------*----------*----------*
> |G -> H| 11.6 | 12.1 |
> *------*----------*----------*
I did some tests on an Intel(R) Xeon(R) Silver 4410Y using iperf-vsock:
- kernel 6.9.0
pkt_size G->H H->G
4k 4.6 6.4
64k 13.8 11.5
128k 13.4 11.7
- kernel 6.9.0 with this series applied
pkt_size G->H H->G
4k 4.6 8.16
64k 12.2 8.9
128k 12.8 8.8
I see a big drop, especially on H->G with big packets. Can you try to
replicate on your env?
I'll try to understand more and also an i7 on the next days.
Thanks,
Stefano
>
>As benchmark 'vsock-iperf' with default arguments was used. There is no
>significant performance difference before and after this patch.
>
>Signed-off-by: Arseniy Krasnov <avkrasnov@...utedevices.com>
>---
> include/linux/virtio_vsock.h | 1 +
> net/vmw_vsock/virtio_transport_common.c | 8 +++-----
> 2 files changed, 4 insertions(+), 5 deletions(-)
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index c82089dee0c8..3579491c411e 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -135,6 +135,7 @@ struct virtio_vsock_sock {
> u32 peer_buf_alloc;
>
> /* Protected by rx_lock */
>+ u32 rx_cnt;
> u32 fwd_cnt;
> u32 last_fwd_cnt;
> u32 rx_bytes;
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 16ff976a86e3..1d4e2328e06e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -441,6 +441,7 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> return false;
>
> vvs->rx_bytes += len;
>+ vvs->rx_cnt += len;
> return true;
> }
>
>@@ -558,7 +559,6 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> size_t bytes, total = 0;
> struct sk_buff *skb;
> u32 fwd_cnt_delta;
>- bool low_rx_bytes;
> int err = -EFAULT;
> u32 free_space;
>
>@@ -603,9 +603,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> }
>
> fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
>- free_space = vvs->buf_alloc - fwd_cnt_delta;
>- low_rx_bytes = (vvs->rx_bytes <
>- sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
>+ free_space = vvs->buf_alloc - (vvs->rx_cnt - vvs->last_fwd_cnt);
>
> spin_unlock_bh(&vvs->rx_lock);
>
>@@ -619,7 +617,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> * number of bytes in rx queue is not enough to wake up reader.
> */
> if (fwd_cnt_delta &&
>- (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE || low_rx_bytes))
>+ (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE))
> virtio_transport_send_credit_update(vsk);
>
> return total;
>--
>2.25.1
>
>
Powered by blists - more mailing lists