lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6shb4fowdw43df7pod5kstmtynhrqigd3wdcyrqnni4svgfor2@dgiqw3t2zhfx>
Date: Tue, 1 Jul 2025 12:44:58 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Will Deacon <will@...nel.org>
Cc: linux-kernel@...r.kernel.org, Keir Fraser <keirf@...gle.com>, 
	Steven Moreland <smoreland@...gle.com>, Frederick Mayle <fmayle@...gle.com>, 
	Stefan Hajnoczi <stefanha@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>, 
	Jason Wang <jasowang@...hat.com>, Eugenio Pérez <eperezma@...hat.com>, 
	netdev@...r.kernel.org, virtualization@...ts.linux.dev
Subject: Re: [PATCH 3/5] vhost/vsock: Allocate nonlinear SKBs for handling
 large receive buffers

On Mon, Jun 30, 2025 at 03:20:57PM +0100, Will Deacon wrote:
>On Fri, Jun 27, 2025 at 12:45:45PM +0200, Stefano Garzarella wrote:
>> On Wed, Jun 25, 2025 at 02:15:41PM +0100, Will Deacon wrote:
>> > When receiving a packet from a guest, vhost_vsock_handle_tx_kick()
>> > calls vhost_vsock_alloc_skb() to allocate and fill an SKB with the
>> > receive data. Unfortunately, these are always linear allocations and can
>> > therefore result in significant pressure on kmalloc() considering that
>> > the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE +
>> > VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB
>> > allocation for each packet.
>> >
>> > Rework the vsock SKB allocation so that, for sizes with page order
>> > greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated
>> > instead with the packet header in the SKB and the receive data in the
>> > fragments.
>> >
>> > Signed-off-by: Will Deacon <will@...nel.org>
>> > ---
>> > drivers/vhost/vsock.c        | 15 +++++++++------
>> > include/linux/virtio_vsock.h | 31 +++++++++++++++++++++++++------
>> > 2 files changed, 34 insertions(+), 12 deletions(-)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index 66a0f060770e..cfa4e1bcf367 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -344,11 +344,16 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
>> >
>> > 	len = iov_length(vq->iov, out);
>> >
>> > -	if (len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
>> > +	if (len < VIRTIO_VSOCK_SKB_HEADROOM ||
>>
>> Why moving this check here?
>
>I moved it here because virtio_vsock_alloc_skb_with_frags() does:
>
>+       size -= VIRTIO_VSOCK_SKB_HEADROOM;
>+       return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>+                                                  size, mask);
>
>and so having the check in __virtio_vsock_alloc_skb_with_frags() looks
>strange as, by then, it really only applies to the linear case. It also
>feels weird to me to have the upper-bound of the length checked by the
>caller but the lower-bound checked in the callee. I certainly find it
>easier to reason about if they're in the same place.
>
>Additionally, the lower-bound check is only needed by the vhost receive
>code, as the transmit path uses virtio_vsock_alloc_skb(), which never
>passes a size smaller than VIRTIO_VSOCK_SKB_HEADROOM.
>
>Given all that, moving it to the one place that needs it seemed like the
>best option. What do you think?

Okay, I see now. Yep, it's fine, but please mention in the commit 
description.

>
>> > +	    len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM)
>> > 		return NULL;
>> >
>> > 	/* len contains both payload and hdr */
>> > -	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>> > +	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>> > +		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
>> > +	else
>> > +		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>>
>> Can we do this directly in virtio_vsock_alloc_skb() so we don't need
>> to duplicate code on virtio/vhost code?
>
>We can, but then I think we should do something different for the
>rx_fill() path -- it feels fragile to rely on that using small-enough
>buffers to guarantee linear allocations. How about I:
>
> 1. Add virtio_vsock_alloc_linear_skb(), which always performs a linear
>    allocation.
>
> 2. Change virtio_vsock_alloc_skb() to use nonlinear SKBs for sizes
>    greater than SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)
>
> 3. Use virtio_vsock_alloc_linear_skb() to fill the guest RX buffers
>
> 4. Use virtio_vsock_alloc_skb() for everything else
>
>If you like the idea, I'll rework the series along those lines.
>Diff below... (see end of mail)

I really like it :-) let's go in that direction!

>
>> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>> > index 67ffb64325ef..8f9fa1cab32a 100644
>> > --- a/include/linux/virtio_vsock.h
>> > +++ b/include/linux/virtio_vsock.h
>> > @@ -51,27 +51,46 @@ static inline void virtio_vsock_skb_rx_put(struct sk_buff *skb)
>> > {
>> > 	u32 len;
>> >
>> > +	DEBUG_NET_WARN_ON_ONCE(skb->len);
>>
>> Should we mention in the commit message?
>
>Sure, I'll add something. The non-linear handling doesn't accumulate len,
>so it's a debug check to ensure that len hasn't been messed with between
>allocation and here.
>
>> > 	len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
>> >
>> > -	if (len > 0)
>>
>> Why removing this check?
>
>I think it's redundant: len is a u32, so we're basically just checking
>to see if it's non-zero. All the callers have already checked for this
>but, even if they didn't, skb_put(skb, 0) is harmless afaict.

Yep, I see, but now I don't remember why we have it, could it be more
expensive to call `skb_put(skb, 0)`, instead of just having the if for
control packets with no payload?

Thanks,
Stefano

>
>Will
>
>--->8
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 3799c0aeeec5..a6cd72a32f63 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -349,11 +349,7 @@ vhost_vsock_alloc_skb(struct vhost_virtqueue *vq,
> 		return NULL;
>
> 	/* len contains both payload and hdr */
>-	if (len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>-		skb = virtio_vsock_alloc_skb_with_frags(len, GFP_KERNEL);
>-	else
>-		skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
>-
>+	skb = virtio_vsock_alloc_skb(len, GFP_KERNEL);
> 	if (!skb)
> 		return NULL;
>
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 0e265921be03..ed5eab46e3dc 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -79,16 +79,19 @@ __virtio_vsock_alloc_skb_with_frags(unsigned int header_len,
> }
>
> static inline struct sk_buff *
>-virtio_vsock_alloc_skb_with_frags(unsigned int size, gfp_t mask)
>+virtio_vsock_alloc_linear_skb(unsigned int size, gfp_t mask)
> {
>-	size -= VIRTIO_VSOCK_SKB_HEADROOM;
>-	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>-						   size, mask);
>+	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
> }
>
> static inline struct sk_buff *virtio_vsock_alloc_skb(unsigned int size, gfp_t mask)
> {
>-	return __virtio_vsock_alloc_skb_with_frags(size, 0, mask);
>+	if (size <= SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>+		return virtio_vsock_alloc_linear_skb(size, mask);
>+
>+	size -= VIRTIO_VSOCK_SKB_HEADROOM;
>+	return __virtio_vsock_alloc_skb_with_frags(VIRTIO_VSOCK_SKB_HEADROOM,
>+						   size, mask);
> }
>
> static inline void
>diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
>index 4ae714397ca3..8c9ca0cb0d4e 100644
>--- a/net/vmw_vsock/virtio_transport.c
>+++ b/net/vmw_vsock/virtio_transport.c
>@@ -321,7 +321,7 @@ static void virtio_vsock_rx_fill(struct virtio_vsock *vsock)
> 	vq = vsock->vqs[VSOCK_VQ_RX];
>
> 	do {
>-		skb = virtio_vsock_alloc_skb(total_len, GFP_KERNEL);
>+		skb = virtio_vsock_alloc_linear_skb(total_len, GFP_KERNEL);
> 		if (!skb)
> 			break;
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index 424eb69e84f9..f74677c3511e 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -262,11 +262,7 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *
> 	if (!zcopy)
> 		skb_len += payload_len;
>
>-	if (skb_len > SKB_WITH_OVERHEAD(PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
>-		skb = virtio_vsock_alloc_skb_with_frags(skb_len, GFP_KERNEL);
>-	else
>-		skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
>-
>+	skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL);
> 	if (!skb)
> 		return NULL;
>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ