[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87frw2axv0.ffs@tglx>
Date: Thu, 04 Apr 2024 00:24:19 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: John Stultz <jstultz@...gle.com>
Cc: Oleg Nesterov <oleg@...hat.com>, Marco Elver <elver@...gle.com>, Peter
Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>, "Eric W.
Biederman" <ebiederm@...ssion.com>, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, Dmitry Vyukov <dvyukov@...gle.com>,
kasan-dev@...glegroups.com, Edward Liaw <edliaw@...gle.com>, Carlos Llamas
<cmllamas@...gle.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the
current thread
On Wed, Apr 03 2024 at 12:35, John Stultz wrote:
> On Wed, Apr 3, 2024 at 12:10 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>>
>> On Wed, Apr 03 2024 at 11:16, John Stultz wrote:
>> > On Wed, Apr 3, 2024 at 9:32 AM Thomas Gleixner <tglx@...utronixde> wrote:
>> > Thanks for this, Thomas!
>> >
>> > Just FYI: testing with 6.1, the test no longer hangs, but I don't see
>> > the SKIP behavior. It just fails:
>> > not ok 6 check signal distribution
>> > # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
>> >
>> > I've not had time yet to dig into what's going on, but let me know if
>> > you need any further details.
>>
>> That's weird. I ran it on my laptop with 6.1.y ...
>>
>> What kind of machine is that?
>
> I was running it in a VM.
>
> Interestingly with 64cpus it sometimes will do the skip behavior, but
> with 4 cpus it seems to always fail.
Duh, yes. The problem is that any thread might grab the signal as it is
process wide.
What was I thinking? Not much obviously.
The distribution mechanism is only targeting the wakeup at signal
queuing time and therefore avoids the wakeup of idle tasks. But it does
not guarantee that the signal is evenly distributed to the threads on
actual signal delivery.
Even with the change to stop the worker threads when they got a signal
it's not guaranteed that the last worker will actually get one within
the timeout simply because the main thread can win the race to collect
the signal every time. I just managed to make the patched test fail in
one out of 100 runs.
IOW, we cannot test this reliably at all with the current approach.
I'll think about it tomorrow again with brain awake.
Thanks,
tglx
Powered by blists - more mailing lists