linux-kernel - Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d2fc2ff2-98d9-49f8-af95-968100174d55@gmail.com>
Date: Thu, 22 Jan 2026 11:43:28 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Jens Axboe <axboe@...nel.dk>, Yuhao Jiang <danisjiang@...il.com>
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing
 cross-buffer accounting

On 1/21/26 14:58, Jens Axboe wrote:
> On 1/20/26 2:45 PM, Pavel Begunkov wrote:
>> On 1/20/26 17:03, Jens Axboe wrote:
>>> On 1/20/26 5:05 AM, Pavel Begunkov wrote:
>>>> On 1/20/26 07:05, Yuhao Jiang wrote:
>> ...
>>>>>
>>>>> I've been implementing the xarray-based ref tracking approach for v3.
>>>>> While working on it, I discovered an issue with buffer cloning.
>>>>>
>>>>> If ctx1 has two buffers sharing a huge page, ctx1->hpage_acct[page] = 2.
>>>>> Clone to ctx2, now both have a refcount of 2. On cleanup both hit zero
>>>>> and unaccount, so we double-unaccount and user->locked_vm goes negative.
>>>>>
>>>>> The per-context xarray can't coordinate across clones - each context
>>>>> tracks its own refcount independently. I think we either need a global
>>>>> xarray (shared across all contexts), or just go back to v2. What do
>>>>> you think?
>>>>
>>>> The Jens' diff is functionally equivalent to your v1 and has
>>>> exactly same problems. Global tracking won't work well.
>>>
>>> Why not? My thinking was that we just use xa_lock() for this, with
>>> a global xarray. It's not like register+unregister is a high frequency
>>> thing. And if they are, then we've got much bigger problems than the
>>> single lock as the runtime complexity isn't ideal.
>>
>> 1. There could be quite a lot of entries even for a single ring
>> with realistic amount of memory. If lots of threads start up
>> at the same time taking it in a loop, it might become a chocking
>> point for large systems. Should be even more spectacular for
>> some numa setups.
> 
> I already briefly touched on that earlier, for sure not going to be of
> any practical concern.

Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the
xarray business, that's 50-100ms. It's all serialised, so multiply by
the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky
high spinlock contention, and it jumps again, and there can be more
memory / CPUs / numa nodes. Not saying that it's worse than the
current O(n^2), I have a test program that borderline hangs the
system.

Look, I don't care what it'd be, whether it stutters or blows up the
kernel, I only took a quick look since you pinged me and was asking
"why not". If you don't want to consider my reasoning, as the
maintainer you can merge whatever you like, and it'll be easier for
me as I won't be wasting more time.

-- 
Pavel Begunkov