[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251126222505.1638a66d@pumpkin>
Date: Wed, 26 Nov 2025 22:25:05 +0000
From: david laight <david.laight@...box.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>, Xie Yuanbin
<xieyuanbin1@...wei.com>, brauner@...nel.org, jack@...e.cz,
will@...nel.org, nico@...xnic.net, akpm@...ux-foundation.org, hch@....de,
jack@...e.com, wozizhi@...weicloud.com, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mm@...ck.org, lilinjie8@...wei.com, liaohua4@...wei.com,
wangkefeng.wang@...wei.com, pangliyuan1@...wei.com
Subject: Re: [RFC PATCH] vfs: Fix might sleep in load_unaligned_zeropad()
with rcu read lock held
On Wed, 26 Nov 2025 20:02:21 +0000
Al Viro <viro@...iv.linux.org.uk> wrote:
> On Wed, Nov 26, 2025 at 07:51:54PM +0000, Russell King (Oracle) wrote:
>
> > I don't understand how that helps. Wasn't the report that the filename
> > crosses a page boundary in userspace, but the following page is
> > inaccessible which causes a fault to be taken (as it always would do).
> > Thus, wouldn't "addr" be a userspace address (that the kernel is
> > accessing) and thus be below TASK_SIZE ?
> >
> > I'm also confused - if we can't take a fault and handle it while
> > reading the filename from userspace, how are pages that have been
> > swapped out or evicted from the page cache read back in from storage
> > which invariably results in sleeping - which we can't do here because
> > of the RCU context (not that I've ever understood RCU, which is why
> > I've always referred those bugs to Paul.)
>
> No, the filename is already copied in kernel space *and* it's long enough
> to end right next to the end of page. There's NUL before the end of page,
> at that, with '/' a couple of bytes prior. We attempt to save on memory
> accesses, doing word-by-word fetches, starting from the beginning of
> component. We *will* detect NUL and ignore all subsequent bytes; the
> problem is that the last 3 bytes of page might be '/', 'x' and '\0'.
> We call load_unaligned_zeropad() on page + PAGE_SIZE - 2. And get
> a fetch that spans the end of page.
>
> We don't care what's in the next page, if there is one mapped there
> to start with. If there's nothing mapped, we want zeroes read from
> it, but all we really care about is having the bytes within *our*
> page read correctly - and no oops happening, obviously.
>
> That fault is an extremely cold case on a fairly hot path. We don't
> want to mess with disabling pagefaults, etc. - not for the sake
> of that.
>
Can you fix it with a flag on the exception table entry that means
'don't try to fault in a page'?
I think the logic would be the same as 'disabling pagefaults', just
checking a different flag.
After all the fault itself happens in both cases.
David
Powered by blists - more mailing lists