lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o7xvcw98.fsf@email.froward.int.ebiederm.org>
Date:   Mon, 11 Jul 2022 13:57:55 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3] signal: break out of wait loops on kthread_stop()

"Jason A. Donenfeld" <Jason@...c4.com> writes:

> Hi Eric,
>
> On Mon, Jul 4, 2022 at 2:22 PM Jason A. Donenfeld <Jason@...c4.com> wrote:
>>
>> Hey Eric,
>>
>> On Tue, Jun 28, 2022 at 06:14:41PM +0200, Jason A. Donenfeld wrote:
>> > I was recently surprised to learn that msleep_interruptible(),
>> > wait_for_completion_interruptible_timeout(), and related functions
>> > simply hung when I called kthread_stop() on kthreads using them. The
>> > solution to fixing the case with msleep_interruptible() was more simply
>> > to move to schedule_timeout_interruptible(). Why?
>> >
>> > The reason is that msleep_interruptible(), and many functions just like
>> > it, has a loop like this:
>> >
>> >         while (timeout && !signal_pending(current))
>> >                 timeout = schedule_timeout_interruptible(timeout);
>> >
>> > The call to kthread_stop() woke up the thread, so schedule_timeout_
>> > interruptible() returned early, but because signal_pending() returned
>> > true, it went back into another timeout, which was never woken up.
>> >
>> > This wait loop pattern is common to various pieces of code, and I
>> > suspect that the subtle misuse in a kthread that caused a deadlock in
>> > the code I looked at last week is also found elsewhere.
>> >
>> > So this commit causes signal_pending() to return true when
>> > kthread_stop() is called, by setting TIF_NOTIFY_SIGNAL.
>> >
>> > The same also applies to the similar kthread_park() functionality.
>> >
>> > Cc: Ingo Molnar <mingo@...hat.com>
>> > Cc: Peter Zijlstra <peterz@...radead.org>
>> > Cc: Eric W. Biederman <ebiederm@...ssion.com>
>> > Cc: Toke Høiland-Jørgensen <toke@...hat.com>
>> > Cc: Kalle Valo <kvalo@...nel.org>
>> > Cc: Johannes Berg <johannes@...solutions.net>
>> > Signed-off-by: Jason A. Donenfeld <Jason@...c4.com>
>> > ---
>> >  kernel/kthread.c | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/kernel/kthread.c b/kernel/kthread.c
>> > index 3c677918d8f2..63d5a1f4cb93 100644
>> > --- a/kernel/kthread.c
>> > +++ b/kernel/kthread.c
>> > @@ -661,12 +661,14 @@ int kthread_park(struct task_struct *k)
>> >
>> >       set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
>> >       if (k != current) {
>> > +             test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >               wake_up_process(k);
>> >               /*
>> >                * Wait for __kthread_parkme() to complete(), this means we
>> >                * _will_ have TASK_PARKED and are about to call schedule().
>> >                */
>> >               wait_for_completion(&kthread->parked);
>> > +             clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >               /*
>> >                * Now wait for that schedule() to complete and the task to
>> >                * get scheduled out.
>> > @@ -704,8 +706,10 @@ int kthread_stop(struct task_struct *k)
>> >       kthread = to_kthread(k);
>> >       set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
>> >       kthread_unpark(k);
>> > +     test_and_set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >       wake_up_process(k);
>> >       wait_for_completion(&kthread->exited);
>> > +     clear_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
>> >       ret = kthread->result;
>> >       put_task_struct(k);
>> >
>> > --
>> > 2.35.1
>> >
>>
>> Is this more to the tune of what you had in mind in your message [1]?
>>
>> Jason
>>
>> [1] https://lore.kernel.org/lkml/877d51udc7.fsf@email.froward.int.ebiederm.org/
>
> Paging again...

Overall it looks reasonable.

For kthread_stop clearing TIF_NOTIFY_SIGNAL seems pointless as the
process has exited.

I wonder if we should clear TIF_NOTIFY_SIGNAL in kthread_parkme.

Ordinarily TIF_NOTIFY_SIGNAL gets cleared when the target process
calls get_signal.  Which doesn't happen for kernel threads.
So I am not certain what the best pattern is, but my sense is that
keeping things as close to how TIF_NOTIFY_SIGNAL is processed
in the rest of the kernel seems the least error prone pattern.

So I am thinking the code should do something like:

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 544fd4097406..c80e8d44e405 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -266,6 +266,7 @@ static void __kthread_parkme(struct kthread *self)
 		 * changin from TASK_PARKED and us failing the
 		 * wait_task_inactive() in kthread_park().
 		 */
+		clear_notify_signal();
 		set_special_state(TASK_PARKED);
 		if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
 			break;


To ensure that TIF_NOTIFY_SIGNAL is not set.

I am trying to think about how things interact if two threads call
kthread_park.  If the target thread is not responsible for clearing
TIF_NOTIFY_SIGNAL it feels like two threads could get into a race.  A
race where one thread sees the target thread has parked itself right
after another thread calls kthread_park and sets TIF_NOTIFY_SIGNAL.

Given the nature of kthread_park that scenario is probably completely
ridiculous, but it is all I can think of at the moment to demonstrate
my concerns.

In practice kthread_park is pretty specialized.  Do you have any
evidence that we need to break out of interruptible loops to make it
more reliable.  Perhaps it is best just to handle kthread_stop, and
leave kthread_park for when it actually matters.  Which would ensure
there are examples that people care about to help sort through exactly
how the code should behave in the kthread_park case.

Which I guess my long way of saying I think you can just change
kthread_stop to say:

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 544fd4097406..52e9b3432496 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -704,6 +704,7 @@ int kthread_stop(struct task_struct *k)
 	kthread = to_kthread(k);
 	set_bit(KTHREAD_SHOULD_STOP, &kthread->flags);
 	kthread_unpark(k);
+	set_tsk_thread_flag(k, TIF_NOTIFY_SIGNAL);
 	wake_up_process(k);
 	wait_for_completion(&kthread->exited);
 	ret = kthread->result;


Your patch is correct that most of set_notify_signal is redundant with
wake_up_process(k).  Further if you aren't going to use the return value
like set_notify_signal does there is no need to test if the flag gets
set.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ