linux-kernel - Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87o6z0jrx6.ffs@tglx>
Date: Mon, 17 Feb 2025 20:24:21 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Eric Dumazet <edumazet@...gle.com>, Anna-Maria Behnsen
 <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>
Cc: linux-kernel <linux-kernel@...r.kernel.org>, Benjamin Segall
 <bsegall@...gle.com>, Eric Dumazet <eric.dumazet@...il.com>, Eric Dumazet
 <edumazet@...gle.com>, Andrey Vagin <avagin@...nvz.org>, Pavel Tikhomirov
 <ptikhomirov@...tuozzo.com>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()

On Fri, Feb 14 2025 at 13:59, Eric Dumazet wrote:

> If many posix timers are hashed in posix_timers_hashtable,
> hash_lock can be held for long durations.
>
> This can be really bad in some cases as Thomas
> explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/

I really hate the horrible ABI which we can't get rid of w/o breaking
CRIU.

The global hash really needs to go away and be replaced by a per signal
xarray. That can be done, but due to CRIU there is no way to make this
non-sparse by reusing holes, which are created by deleted timers.

The sad truth is that the kernel has absolutely zero clue that this
happens in a CRIU restore operation context, unless I'm missing
something.

If it would be able to detect it, then we could work around it
somehow. But without that there is not much we can do aside of breaking
the ABI.

Though in the above thread the CRIU people already signaled that they
are willing to work out a migration scheme. I just forgot to revisit
this. Let me stare at it some more.

> We can perform all searches under RCU, then acquire
> the lock only when there is a good chance to need it,
> and after cpu caches were populated.
>
> I also added a cond_resched() in the possible long loop.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>  kernel/time/posix-timers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 204a351a2fd3..dd2f9016d3dc 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>  
>  		head = &posix_timers_hashtable[hash(sig, id)];
>  
> +		rcu_read_lock();
> +		if (__posix_timers_find(head, sig, id)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			continue;
> +		}
> +		rcu_read_unlock();
>  		spin_lock(&hash_lock);
> +		/*
> +		 * We must perform the lookup under hash_lock protection
> +		 * because another thread could have used the same id.

Hmm, that won't help and is broken already today as timer->id is set at
the call site after releasing hash_lock.

> +		 * This is very unlikely, but possible.

Only if the process is able to install INT_MAX - 1 timers and the stupid
search wraps around (INT_MAX loops) on the other thread and ends up at
the same number again. But yes, theoretically it's possible. :)

So the timer ID must be set _before_ adding it to the hash list, but
that wants to be a seperate patch.

> +		 */
>  		if (!__posix_timers_find(head, sig, id)) {
>  			hlist_add_head_rcu(&timer->t_hash, head);
>  			spin_unlock(&hash_lock);

Thanks,

        tglx