[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0adb508f-480d-4bfc-b861-3cf42e87bee1@gmail.com>
Date: Wed, 14 Jan 2026 14:10:42 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Yuhao Jiang <danisjiang@...il.com>, Jens Axboe <axboe@...nel.dk>
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass via compound
page accounting
On 1/13/26 19:44, Pavel Begunkov wrote:
> On 1/9/26 03:02, Yuhao Jiang wrote:
>> Hi Jens, Pavel, and all,
>>
>> Just a gentle follow-up on this patch below.
>> Please let me know if there are any concerns or if changes are needed.
>
> I'm pretty this will break with buffer sharing / cloning. I'd
> be tempted to remove all this cross buffer accounting logic
> and overestimate it, the current accounting is not sane.
> Otherwise, it'll likely need some proxy object shared b/w
> buffers or some other overly overcomplicated solution
Another way would be to double account cloned buffers and then
have your patch, which combines overaccounting with the ugliness
of full buffer table walks.
>> On Wed, Dec 17, 2025 at 9:00 PM Yuhao Jiang <danisjiang@...il.com> wrote:
>>>
>>> When multiple registered buffers share the same compound page, only the
>>> first buffer accounts for the memory via io_buffer_account_pin(). The
>>> subsequent buffers skip accounting since headpage_already_acct() returns
>>> true.
>>>
>>> When the first buffer is unregistered, the accounting is decremented,
>>> but the compound page remains pinned by the remaining buffers. This
>>> creates a state where pinned memory is not properly accounted against
>>> RLIMIT_MEMLOCK.
>>>
>>> On systems with HugeTLB pages pre-allocated, an unprivileged user can
>>> exploit this to pin memory beyond RLIMIT_MEMLOCK by cycling buffer
>>> registrations. The bypass amount is proportional to the number of
>>> available huge pages, potentially allowing gigabytes of memory to be
>>> pinned while the kernel accounting shows near-zero.
>>>
>>> Fix this by recalculating the actual pages to unaccount when unmapping
>>> a buffer. For regular pages, always unaccount. For compound pages, only
>>> unaccount if no other registered buffer references the same compound
>>> page. This ensures the accounting persists until the last buffer
>>> referencing the compound page is released.
>>>
>>> Reported-by: Yuhao Jiang <danisjiang@...il.com>
>>> Fixes: 57bebf807e2a ("io_uring/rsrc: optimise registered huge pages")
>
> That's not the right commit, the accounting is ancient, should
> get blamed somewhere around first commits that added registered
> buffers.
Turns it came just a bit later:
commit de2939388be564836b06f0f06b3787bdedaed822
Author: Jens Axboe <axboe@...nel.dk>
Date: Thu Sep 17 16:19:16 2020 -0600
io_uring: improve registered buffer accounting for huge pages
--
Pavel Begunkov
Powered by blists - more mailing lists