lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220928151135.pvrlsylg6j3hzh74@sgarzare-redhat>
Date:   Wed, 28 Sep 2022 17:11:35 +0200
From:   Stefano Garzarella <sgarzare@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Junichi Uekawa <uekawa@...omium.org>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>, davem@...emloft.net,
        netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
        virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
        Paolo Abeni <pabeni@...hat.com>, linux-kernel@...r.kernel.org,
        Bobby Eshleman <bobby.eshleman@...il.com>
Subject: Re: [PATCH] vhost/vsock: Use kvmalloc/kvfree for larger packets.

On Wed, Sep 28, 2022 at 05:31:58AM -0400, Michael S. Tsirkin wrote:
>On Wed, Sep 28, 2022 at 10:28:23AM +0200, Stefano Garzarella wrote:
>> On Wed, Sep 28, 2022 at 03:45:38PM +0900, Junichi Uekawa wrote:
>> > When copying a large file over sftp over vsock, data size is usually 32kB,
>> > and kmalloc seems to fail to try to allocate 32 32kB regions.
>> >
>> > Call Trace:
>> >  [<ffffffffb6a0df64>] dump_stack+0x97/0xdb
>> >  [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138
>> >  [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8
>> >  [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d
>> >  [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19
>> >  [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb
>> >  [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7
>> >  [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d
>> >  [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock]
>> >  [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost]
>> >  [<ffffffffb683ddce>] kthread+0xfd/0x105
>> >  [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost]
>> >  [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> >  [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80
>> >  [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3
>> >
>> > Work around by doing kvmalloc instead.
>> >
>> > Signed-off-by: Junichi Uekawa <uekawa@...omium.org>
>
>My worry here is that this in more of a work around.
>It would be better to not allocate memory so aggressively:
>if we are so short on memory we should probably process
>packets one at a time. Is that very hard to implement?

Currently the "virtio_vsock_pkt" is allocated in the "handle_kick" 
callback of TX virtqueue. Then the packet is multiplexed on the right 
socket queue, then the user space can de-queue it whenever they want.

So maybe we can stop processing the virtqueue if we are short on memory, 
but when can we restart the TX virtqueue processing?

I think as long as the guest used only 4K buffers we had no problem, but 
now that it can create larger buffers the host may not be able to 
allocate it contiguously. Since there is no need to have them contiguous 
here, I think this patch is okay.

However, if we switch to sk_buff (as Bobby is already doing), maybe we 
don't have this problem because I think there is some kind of 
pre-allocated pool.

>
>
>
>> > ---
>> >
>> > drivers/vhost/vsock.c                   | 2 +-
>> > net/vmw_vsock/virtio_transport_common.c | 2 +-
>> > 2 files changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>> > index 368330417bde..5703775af129 100644
>> > --- a/drivers/vhost/vsock.c
>> > +++ b/drivers/vhost/vsock.c
>> > @@ -393,7 +393,7 @@ vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
>> > 		return NULL;
>> > 	}
>> >
>> > -	pkt->buf = kmalloc(pkt->len, GFP_KERNEL);
>> > +	pkt->buf = kvmalloc(pkt->len, GFP_KERNEL);
>> > 	if (!pkt->buf) {
>> > 		kfree(pkt);
>> > 		return NULL;
>> > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > index ec2c2afbf0d0..3a12aee33e92 100644
>> > --- a/net/vmw_vsock/virtio_transport_common.c
>> > +++ b/net/vmw_vsock/virtio_transport_common.c
>> > @@ -1342,7 +1342,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_recv_pkt);
>> >
>> > void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt)
>> > {
>> > -	kfree(pkt->buf);
>> > +	kvfree(pkt->buf);
>>
>> virtio_transport_free_pkt() is used also in virtio_transport.c and
>> vsock_loopback.c where pkt->buf is allocated with kmalloc(), but IIUC
>> kvfree() can be used with that memory, so this should be fine.
>>
>> > 	kfree(pkt);
>> > }
>> > EXPORT_SYMBOL_GPL(virtio_transport_free_pkt);
>> > --
>> > 2.37.3.998.g577e59143f-goog
>> >
>>
>> This issue should go away with the Bobby's work about introducing sk_buff
>> [1], but we can queue this for now.
>>
>> I'm not sure if we should do the same also in the virtio-vsock driver
>> (virtio_transport.c). Here in vhost-vsock the buf allocated is only used in
>> the host, while in the virtio-vsock driver the buffer is exposed to the
>> device emulated in the host, so it should be physically contiguous (if not,
>> maybe we need to adjust virtio_vsock_rx_fill()).
>
>More importantly it needs to support DMA API which IIUC kvmalloc
>memory does not.
>

Right, good point!

Thanks,
Stefano

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ