lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHYQsXR996msVqgqMRznharf1v1Yrwpo7029Oen3cTHZgYEn3A@mail.gmail.com>
Date: Wed, 14 Jan 2026 14:59:54 -0600
From: Yuhao Jiang <danisjiang@...il.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass via compound
 page accounting

On Wed, Jan 14, 2026 at 8:10 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>
> On 1/13/26 19:44, Pavel Begunkov wrote:
> > On 1/9/26 03:02, Yuhao Jiang wrote:
> >> Hi Jens, Pavel, and all,
> >>
> >> Just a gentle follow-up on this patch below.
> >> Please let me know if there are any concerns or if changes are needed.
> >
> > I'm pretty this will break with buffer sharing / cloning. I'd
> > be tempted to remove all this cross buffer accounting logic
> > and overestimate it, the current accounting is not sane.
> > Otherwise, it'll likely need some proxy object shared b/w
> > buffers or some other overly overcomplicated solution
>
> Another way would be to double account cloned buffers and then
> have your patch, which combines overaccounting with the ugliness
> of full buffer table walks.
>
> >> On Wed, Dec 17, 2025 at 9:00 PM Yuhao Jiang <danisjiang@...il.com> wrote:
> >>>
> >>> When multiple registered buffers share the same compound page, only the
> >>> first buffer accounts for the memory via io_buffer_account_pin(). The
> >>> subsequent buffers skip accounting since headpage_already_acct() returns
> >>> true.
> >>>
> >>> When the first buffer is unregistered, the accounting is decremented,
> >>> but the compound page remains pinned by the remaining buffers. This
> >>> creates a state where pinned memory is not properly accounted against
> >>> RLIMIT_MEMLOCK.
> >>>
> >>> On systems with HugeTLB pages pre-allocated, an unprivileged user can
> >>> exploit this to pin memory beyond RLIMIT_MEMLOCK by cycling buffer
> >>> registrations. The bypass amount is proportional to the number of
> >>> available huge pages, potentially allowing gigabytes of memory to be
> >>> pinned while the kernel accounting shows near-zero.
> >>>
> >>> Fix this by recalculating the actual pages to unaccount when unmapping
> >>> a buffer. For regular pages, always unaccount. For compound pages, only
> >>> unaccount if no other registered buffer references the same compound
> >>> page. This ensures the accounting persists until the last buffer
> >>> referencing the compound page is released.
> >>>
> >>> Reported-by: Yuhao Jiang <danisjiang@...il.com>
> >>> Fixes: 57bebf807e2a ("io_uring/rsrc: optimise registered huge pages")
> >
> > That's not the right commit, the accounting is ancient, should
> > get blamed somewhere around first commits that added registered
> > buffers.
>
> Turns it came just a bit later:
>
> commit de2939388be564836b06f0f06b3787bdedaed822
> Author: Jens Axboe <axboe@...nel.dk>
> Date:   Thu Sep 17 16:19:16 2020 -0600
>
>      io_uring: improve registered buffer accounting for huge pages
>
> --
> Pavel Begunkov
>

Thanks for the review. I see the issues with buffer sharing/cloning and
the accounting concerns you pointed out. I'll rework this accordingly
and send a v2, and also fix the Fixes tag.

Best regards,
Yuhao Jiang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ