netdev - Re: [PATCH v3 6/7] 9p/trans_virtio: support larger msize values

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YZjfxT24by0Cse6q@codewreck.org>
Date:   Sat, 20 Nov 2021 20:45:09 +0900
From:   Dominique Martinet <asmadeus@...ewreck.org>
To:     Nikolay Kichukov <nikolay@...um.net>
Cc:     Christian Schoenebeck <linux_oss@...debyte.com>,
        v9fs-developer@...ts.sourceforge.net, netdev@...r.kernel.org,
        Eric Van Hensbergen <ericvh@...il.com>,
        Latchesar Ionkov <lucho@...kov.net>,
        Greg Kurz <groug@...d.org>, Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [PATCH v3 6/7] 9p/trans_virtio: support larger msize values

(Thanks for restarting this thread, this had totally slipped out of my
mind...)

Nikolay Kichukov wrote on Sat, Nov 20, 2021 at 12:20:35PM +0100:
> When the client mounts the share via virtio, requested msize is:
> 10485760 or 104857600
> 
> however the mount succeeds with:
> msize=507904 in the end as per the /proc filesystem. This is less than
> the previous maximum value.

(Not sure about this, I'll test these patches tomorrow, but since
something failed I'm not surprised you have less than what you could
have here: what do you get with a more reasonable value like 1M first?)

> In addition to the above, when the kernel on the guest boots and loads
> 9pfs support, the attached memory allocation failure trace is generated.
> 
> Is anyone else seeing similar and was anybody able to get msize set to
> 10MB via virtio protocol with these patches?

I don't think the kernel would ever allow this with the given code, as
the "common" part of 9p is not smart enough to use scatter-gather and
tries to do a big allocation (almost) the size of the msize:

---
        clnt->fcall_cache =
                kmem_cache_create_usercopy("9p-fcall-cache", clnt->msize,
                                           0, 0, P9_HDRSZ + 4,
                                           clnt->msize - (P9_HDRSZ + 4),
                                           NULL);

...
		fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
---
So in practice, you will be capped at 2MB as that is the biggest the
slab will be able to hand over in a single chunk.

It'll also make allocation failures quite likely as soon as the system
has had some uptime (depending on your workload, look at /proc/buddyinfo
if your machines normally have 2MB chunks available), so I would really
not recommend running with buffers bigger than e.g. 512k on real
workloads -- it looks great on benchmarks, especially as it's on its own
slab so as long as you're doing a lot of requests it will dish out
buffers fast, but it'll likely be unreliable over time.
(oh, and we allocate two of these per request...)

The next step to support large buffers really would be splitting htat
buffer to:
 1/ not allocate huge buffers for small metadata ops, e.g. an open call
certainly doesn't need to allocate 10MB of memory
 2/ support splitting the buffer in some scattergather array

Ideally we'd only allocate on an as-need basis, most of the protocol
calls bound how much data is supposed to come back and we know how much
we want to send (it's a format string actually, but we can majorate it
quite easily), so one would need to adjust all protocol calls to pass
this info to p9_client_rpc/p9_client_zc_rpc so it only allocates buffers
as required, if necessary in multiple reasonably-sized segments (I'd
love 2MB hugepages-backed folios...), and have all transports use these
buffers.

I have a rough idea on how to do all this but honestly less than 0 time
for that, so happy to give advices or review any patch, but it's going
to be a lot of work that stand in the way of really big IOs.

> [    1.527981] 9p: Installing v9fs 9p2000 file system support
> [    1.528173] ------------[ cut here ]------------
> [    1.528174] WARNING: CPU: 1 PID: 791 at mm/page_alloc.c:5356 __alloc_pages+0x1ed/0x290

This warning is exactly what I was saying about the allocation cap:
you've requested an allocation that was bigger than the max __alloc_page
is willing to provide (MAX_ORDER, 11, so 2MB as I was saying)

-- 
Dominique