lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 25 Jun 2022 10:01:35 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Michal Hocko <mhocko@...e.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oleg Nesterov <oleg@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: re. Spurious wakeup on a newly created kthread

On Fri, Jun 24, 2022 at 10:00 PM Tejun Heo <tj@...nel.org> wrote:
>
> So, Petr debugged a NULL deref in workqueue code to a spurious wakeup
> on a newly created kthread.

What? No. That patch can't be right for several reasons.

What we call "spurious wakeups" exist, but they are about wakeups that
happen from being on a _previous_ wait-queue, and having already been
removed from it.

They aren't "really" spurious, they are just asynchronous enough (and
thus unexpected) that you basically should never have a "sleep on
wait-queue" without then looping and re-testing the condition.

There is no _truly_ spurious wakeup. You were always woken up for a
reason, it's just that there are more reasons than the entirely
obvious ones.

For example, the reason that quoted patch cannot be right is that this
code pattern:

  while (wait_for_completion_interruptible(&worker->ready_to_start))
    ;

is not valid kernel code. EVER. There is absolutely no way that can be correct.

Either that code can take a signal, or it cannot. If it can take a
signal, it had better react to said signal. If it cannot, it must not
use an interruptble sleep - since now that loop turned into a
kernel-side busy-loop.

So NAK on this kind of crazy "I don't know what happened, so I'll just
add *more* bugs to the code" voodoo programming.

And no, we don't "fix" that by then adding a timeout.

Stop this "add random code" thing.

If you cannot be woken up before X happens, then you should:

 - don't go to sleep before X happens

 - don't add yourself to any wait-queues before X happens

 - don't expose your process to others before X happens

The solution is *not* to add some completion with random "wait for
this before waking".

I think the problem here is much more fundamental: you expect a new
thread to not wake up until you've told it to.

We do have that infrastructure in the kernel: when  you create a new
thread, you can do various setup, and the thread won't actually run
until you do "wake_up_new_task()" on it.

However, that's not how kernel_thread() (or kernel_clone()) works.
Those will call wake_up_new_task(p) for you, and as such a new kernel
thread will immediately start running.

So I think the expectations here are entirely wrong.  I think
create_worker() is fundamentally buggy, in how it does that

        /* start the newly created worker */
        ..
        wake_up_process(worker->task);

because that wake_up_process() is already much too late. The process
got woken up already, because it was created by create_kthread() ->
kernel_thread() -> kernel_clone, which does that wake_up_new_task()
and it starts running.

If you want to do thread setup *bnefore* the kernel is running, it
needs to be done before that wake_up_new_task().

That's very possible. Look at what create_io_thread() does, for
example: it never calls wake_up_new_process() at all, and leaves that
to the caller, which has done the extra setup.

Or the kernel_clone_args could have a "init" function that gets called
before doing the wake_up_new_task() is done. Or a number of other
solutions.

But no, we're not randomly adding some new completion because people
were confused and thought they were waking things up when it was
already awake from before.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ