[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHYQsXQK4nKu+fcni71__=V241RN=QxUHrvNQMQtPMzeL_z=BA@mail.gmail.com>
Date: Tue, 20 Jan 2026 01:05:14 -0600
From: Yuhao Jiang <danisjiang@...il.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Pavel Begunkov <asml.silence@...il.com>, io-uring@...r.kernel.org,
linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing
cross-buffer accounting
Hi Jens,
On Mon, Jan 19, 2026 at 5:40 PM Jens Axboe <axboe@...nel.dk> wrote:
>
> On 1/19/26 4:34 PM, Yuhao Jiang wrote:
> > On Mon, Jan 19, 2026 at 11:03 AM Jens Axboe <axboe@...nel.dk> wrote:
> >>
> >> On 1/19/26 12:10 AM, Yuhao Jiang wrote:
> >>> The trade-off is that memory accounting may be overestimated when
> >>> multiple buffers share compound pages, but this is safe and prevents
> >>> the security issue.
> >>
> >> I'd be worried that this would break existing setups. We obviously need
> >> to get the unmap accounting correct, but in terms of practicality, any
> >> user of registered buffers will have had to bump distro limits manually
> >> anyway, and in that case it's usually just set very high. Otherwise
> >> there's very little you can do with it.
> >>
> >> How about something else entirely - just track the accounted pages on
> >> the side. If we ref those, then we can ensure that if a huge page is
> >> accounted, it's only unaccounted when all existing "users" of it have
> >> gone away. That means if you drop parts of it, it'll remain accounted.
> >>
> >> Something totally untested like the below... Yes it's not a trivial
> >> amount of code, but it is actually fairly trivial code.
> >
> > Thanks, this approach makes sense. I'll send a v3 based on this.
>
> Great, thanks! I think the key is tracking this on the side, and then
> a ref to tell when it's safe to unaccount it. The rest is just
> implementation details.
>
> --
> Jens Axboe
>
I've been implementing the xarray-based ref tracking approach for v3.
While working on it, I discovered an issue with buffer cloning.
If ctx1 has two buffers sharing a huge page, ctx1->hpage_acct[page] = 2.
Clone to ctx2, now both have a refcount of 2. On cleanup both hit zero
and unaccount, so we double-unaccount and user->locked_vm goes negative.
The per-context xarray can't coordinate across clones - each context
tracks its own refcount independently. I think we either need a global
xarray (shared across all contexts), or just go back to v2. What do
you think?
--
Yuhao Jiang
Powered by blists - more mailing lists