linux-kernel - Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180501113132.GF12217@hirez.programming.kicks-ass.net>
Date:   Tue, 1 May 2018 13:31:32 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Kohli, Gaurav" <gkohli@...eaurora.org>
Cc:     tglx@...utronix.de, mpe@...erman.id.au, mingo@...nel.org,
        bigeasy@...utronix.de, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Will Deacon <will.deacon@....com>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against
 wakeup

On Tue, May 01, 2018 at 04:10:53PM +0530, Kohli, Gaurav wrote:
> Yes with loop, it will reset TASK_PARKED but that is not happening in the
> dumps we have seen.

But was that with or without the fixed wait-loop? I don't care about
stuff you might have seen with the current code, that is clearly broken.

> > takedown_cpu() can proceed beyond smpboot_park_threads() and kill the
> > CPU before any of the threads are parked -- per having the complete()
> > before hitting schedule().
> > 
> > And, afaict, that is harmless. When we go offline, sched_cpu_dying() ->
> > migrate_tasks() will migrate any still runnable threads off the cpu.
> > But because at this point the thread must be in the PARKED wait-loop, it
> > will hit schedule() and go to sleep eventually.
> > 
> > Also note that kthread_unpark() does __kthread_bind() to rebind the
> > threads.
> > 
> > Aaaah... I think I've spotted a problem there. We clear SHOULD_PARK
> > before we rebind, so if the thread lost the first PARKED store,
> > does the completion, gets migrated, cycles through the loop and now
> > observes !SHOULD_PARK and bails the wait-loop, then __kthread_bind()
> > will forever wait.
> > 
> 
> So during next unpark
> __kthread_unpark -> __kthread_bind -> wait_task_inactive (this got failed,
> as current state is running so failed on below call:

Aah, yes, I seem to have mis-remembered how wait_task_inactive() works.
And it is indeed still a problem..

Let me ponder what the best solution is, it's a bit of a mess.