[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iKkeRV-TqQnCM-XeCkyiWCPFXNGLUAT4oge3i-Dpxu1aA@mail.gmail.com>
Date: Mon, 24 Feb 2025 10:58:57 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>, Benjamin Segall <bsegall@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH V2 4/4] posix-timers: Use RCU in posix_timer_add()
On Mon, Feb 24, 2025 at 10:33 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Wed, Feb 19 2025 at 12:55, Eric Dumazet wrote:
> > --- a/kernel/time/posix-timers.c
> > +++ b/kernel/time/posix-timers.c
> > @@ -125,7 +125,19 @@ static int posix_timer_add(struct k_itimer *timer)
> >
> > head = &posix_timers_hashtable[hash(sig, id)];
> >
> > + rcu_read_lock();
> > + if (posix_timers_find(head, sig, id)) {
> > + rcu_read_unlock();
> > + cond_resched();
> > + continue;
> > + }
> > + rcu_read_unlock();
> > spin_lock(&hash_lock);
> > + /*
> > + * We must perform the lookup under hash_lock protection
> > + * because another thread could have used the same id.
> > + * This is very unlikely, but possible.
> > + */
> > if (!posix_timers_find(head, sig, id)) {
> > timer->it_id = (timer_t)id;
> > timer->it_signal = (struct signal_struct *)((unsigned long)sig | 1UL);
>
> So I played with that because I wanted understand what's going on.
>
> Actually this change is just voodoo programming. Why?
>
> The only case where this lockless lookup would help is when the timer
> ID wrapped around and the lookup has to validate a large range of
> already occupied IDs, but even then the lookup is dropping hash_lock
> after each ID search. I seriously doubt that this is the case at hand
> because according to Ben this happens when a process is started or
> restored, which means there is no timer ID wrap around and therefore
> no timer ID collision in the process itself.
>
> In fact the extra lookup is counter productive in most cases:
>
> With a single process, which creates 50000 timers in a loop, this
> results in:
>
> [1] mainline +lookup
> create 98 ms 219 ms
>
> and it gets worse with 100000 timers:
>
> [2] mainline +lookup
> create 313 ms 650 ms
>
> Why? Because with 100k timers the hash buckets contain ~200 timers each,
> which means in the worst case 200 timers have to be walked and
> checked. Doing this twice obviously at least doubles the time.
>
> Now there is a case where the change improves the situation, but for the
> very wrong reasons:
>
> A process forks 63 times and all forks wait on a barrier before each
> instance creates 20000 timers. The processes are pinned on seperate CPUs
> to achive maximum concurrency. The numbers are the average times per
> process:
>
> [3] mainline +lookup
> create 157253 ms 40008 ms
>
> But that improvement has absolutely nothing to do with the timer ID
> collision. The extra lookup (and I verified with tracing) creates just
> enough interleaving so that all CPUs can make progress on the hash lock
> instead of being stuck in a cache line bouncing shitshow. The very same
> can be achieved by:
>
> ndelay(total_number_of_timers / $CONSTANT);
>
> So, no. This is not a solution. The proper solution is to replace the
> hash by a scaled hash with per hash bucket locks, like we have in the
> futex code already. I've implemented this and the result proves the
> point with the three test cases:
>
> [1] mainline +lookup scaled hash
> create 98 ms 219 ms 20 ms
>
> [2] mainline +lookup scaled hash
> create 313 ms 650 ms 48 ms
>
> [3] mainline +lookup scaled hash
> create 157253 ms 40008 ms 83 ms
>
> This is really only a problem of the hash collisions and the resulting
> list walk length, which is easy to prove by extending the tests above.
>
> After creating the timer and synchronizing again (for the fork case),
> the test invokes timer_getoverrun(2) for all created timers in a loop.
>
> [1] mainline +lookup scaled hash
> create 98 ms 219 ms 20 ms
> getoverrun 62 ms 62 ms 10 ms
>
> [2] mainline +lookup scaled hash
> create 313 ms 650 ms 48 ms
> getoverrun 261 ms 260 ms 20 ms
>
> [3] mainline +lookup scaled hash
> create 157253 ms 40008 ms 83 ms
> getoverrun 2611 ms 2614 ms 40 ms
>
> I picked timer_getoverrun(2) because that's just doing the lookup and
> reading from the k_itimer struct after locking it. So the syscall has no
> contention on the timer lock nor on anything else. According to perf the
> vast majority of time is spent in the list walk to find the timer and of
> course the cache foot print becomes worse the larger the bucket lists
> become.
>
> Let me write proper changelogs and a cover letter and post that stuff.
>
Great, we are looking forward to your patches.
Powered by blists - more mailing lists