linux-kernel - Re: [PATCH] posix-timers: cond_resched() during exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <xm26y0y22870.fsf@google.com>
Date: Tue, 18 Feb 2025 14:34:43 -0800
From: Benjamin Segall <bsegall@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>,  Frederic Weisbecker
 <frederic@...nel.org>,  linux-kernel@...r.kernel.org,  Eric Dumazet
 <edumazet@...gle.com>,  Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] posix-timers: cond_resched() during exit_itimers()

Thomas Gleixner <tglx@...utronix.de> writes:

> On Fri, Feb 14 2025 at 14:12, Benjamin Segall wrote:
>> exit_itimers() loops through every timer in the process to delete it.
>> This requires taking the system-wide hash_lock for each of these locks,
>> and contends with other processes trying to create or delete timers.
>> When a process creates hundreds of thousands of timers, and then exits
>> while other processes contend with it, this can trigger softlockups on
>> CONFIG_PREEMPT=n.
>>
>> Ideally this will some day be better solved by eliminating the global
>> hashtable, but until that point mitigate the issue by doing
>> cond_resched in that loop.
>
> It won't help for a PREEMPT_NONE kernel because the loop will be equally
> long as before. Only the hash lock contention will be smaller, but that
> does not mean that mopping up 100k timers won't be able to take ages.

Yeah, it could just run into a new lock or other bottleneck, though it's
not immediately obvious to me what it would be (hash_lock isn't sharing
~any of the time in perf tracing, the obvious other locks like hrtimer
are sharded, etc). Just sharding the lock a bunch (leaving the actual
hashtable with the same cacheline sharing even) boosts the speed of my
synthetic contention test freeing 100k timers from 6s to 380ms (with
uncontended exit at 17ms), so I think it's realistic that avoiding
the shared lock/table might well do the job.

Of course nothing is stopping an even buggier application from
just creating more timers (and at that point starting to notice the
fixed hashtable size during timer_create)...

>
> We really need to get this PREEMPT_LAZY thing going and kill all of this
> cond_resched() nonsense.
>
> Thanks,
>
>         tglx