[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSVw0M8f3vTXdQxH@codewreck.org>
Date: Tue, 25 Nov 2025 18:03:12 +0900
From: Dominique Martinet <asmadeus@...ewreck.org>
To: Matthew Wilcox <willy@...radead.org>
Cc: Chris Arges <carges@...udflare.com>,
David Howells <dhowells@...hat.com>, ericvh@...nel.org,
lucho@...kov.net, linux_oss@...debyte.com, v9fs@...ts.linux.dev,
linux-kernel@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: kernel BUG when mounting large block xfs backed by 9p (folio ref
count bug)
Matthew Wilcox wrote on Mon, Nov 24, 2025 at 11:55:59PM +0000:
> > > [ 31.395976][ T62] page_type: f8(unknown)
>
> PGTY_large_kmalloc = 0xf8,
>
> So somebody called kmalloc(2 * 1024 * 1024). Not sure if that's helpful
> in tracking this down?
This is a "zero-copy rpc" so the pages come from wherever the iov_iter
we were passed was from, and we don't really check...
In particular that zero-copy code in net/9p/trans_virtio.c hasn't
changed much since Al Viro rewrote the 9p code to use iov_iter in 2015
(commit 4f3b35c157e4 ("net/9p: switch the guts of
p9_client_{read,write}() to iov_iter")), and I'm not quite sure anyone
ever looked at if it is anywhere close to friendly with folios...
So I guess it turned out not to be:
> > > [ 31.398075][ T62] ? kvm_sched_clock_read+0x11/0x20
> > > [ 31.398131][ T62] ? sched_clock+0x10/0x30
> > > [ 31.398179][ T62] ? sched_clock_cpu+0xf/0x1d0
> > > [ 31.398234][ T62] iov_iter_get_pages_alloc2+0x20/0x50
> > > [ 31.398277][ T62] p9_get_mapped_pages.part.0.constprop.0+0x6f/0x280 [9pnet_virtio]
>
> Oh, hang on. You're passing a kmalloc'ed page to
> iov_iter_get_pages_alloc(). That's not allowed ...
Thanks for finding this, I wouldn't have noticed.
> see https://lore.kernel.org/all/20250310142750.1209192-1-willy@infradead.org/
I'm sorry but I'm not sure I see what I should do from this -- your
patch looks to me like it should now work with this?
Oh, it's not merged?... I don't see where the discussion stalled
either...
For context, in this case virtio needs the pages to be pinned because
the host will write directly into it, and the API we're using is
virtqueue_add_sgs() (drivers/virtio/virtio_ring.c) which expects a
scatterlist, which I guess must be pages (can't say I'm very familiar
with this particular API either, but the word `folio` doesn't show up in
drivers/virtio)
Since we don't know where the iov comes from, we can't have any
expectation about it, but we can check things and try to act
appropriately (or error out and/or somehow fallback to non-zc if there's
a reason we can't do it).
What would one need to go from an iov_iter to something this could use?
out of curiosity I looked at other "big" virtqueue users (e.g. vhost
scsi must be shuffling similar data around), but I don't quite see how
the buffers are passed, I'd need to spend more time than I can afford immediately...
Thanks (and sorry for pulling the whole arm when you give a hand),
--
Dominique
Powered by blists - more mailing lists