lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 14 Feb 2014 10:52:55 +0000
From:	Catalin Marinas <catalin.marinas@....com>
To:	Kirill Tkhai <tkhai@...dex.ru>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Kirill Tkhai <ktkhai@...allels.com>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: Re: [PATCH] sched/core: Create new task with twice disabled
 preemption

On Thu, Feb 13, 2014 at 09:32:22PM +0400, Kirill Tkhai wrote:
> On 13.02.2014 20:00, Peter Zijlstra wrote:
> > On Thu, Feb 13, 2014 at 07:51:56PM +0400, Kirill Tkhai wrote:
> >> For archs without __ARCH_WANT_UNLOCKED_CTXSW set this means
> >> that all newly created tasks execute finish_arch_post_lock_switch()
> >> and post_schedule() with preemption enabled.
> > 
> > That's IA64 and MIPS; do they have a 'good' reason to use this?
> 
> It seems my description misleads reader, I'm sorry if so.
> 
> I mean all architectures *except* IA64 and MIPS. All, which
> has no __ARCH_WANT_UNLOCKED_CTXSW defined.
> 
> IA64 and MIPS already have preempt_enable() in schedule_tail():
> 
> #ifdef __ARCH_WANT_UNLOCKED_CTXSW
>         /* In this case, finish_task_switch does not reenable preemption */
>         preempt_enable();
> #endif
> 
> Their initial preemption is not decremented in finish_lock_switch().
> 
> So, we speak about x86, ARM64 etc.
> 
> Look at ARM64's finish_arch_post_lock_switch(). It looks a task
> must to not be preempted between switch_mm() and this function.
> But in case of new task this is possible.

We had a thread about this at the end of last year:

https://lkml.org/lkml/2013/11/15/82

There is indeed a problem on arm64, something like this (and I think
s390 also needs a fix):

1. switch_mm() via check_and_switch_context() defers the actual mm
   switch by setting TIF_SWITCH_MM
2. the context switch is considered 'done' by the kernel before
   finish_arch_post_lock_switch() and therefore we can be preempted to a
   new thread before finish_arch_post_lock_switch()
3. The new thread has the same mm as the preempted thread but we
   actually missed the mm switching in finish_arch_post_lock_switch()
   because TIF_SWITCH_MM is per thread rather than mm

> This is the problem I tried to solve. I don't know arm64, and I can't
> say how it is serious.

Have you managed to reproduce this? I don't say it doesn't exist, but I
want to make sure that any patch actually fixes it.

So we have more solutions, one of the first two suitable for stable:

1. Propagate the TIF_SWITCH_MM to the next thread (suggested by Martin)
2. Get rid of TIF_SWITCH_MM and use mm_cpumask for tracking (I already
   have the patch, it just needs a lot more testing)
3. Re-write the ASID allocation algorithm to no longer require IPIs and
   therefore drop finish_arch_post_lock_switch() (this can be done, so
   pretty intrusive for stable)
4. Replace finish_arch_post_lock_switch() with finish_mm_switch() as per
   Martin's patch and I think this would guarantee a call always, we can
   move the mm switching from switch_mm() to finish_mm_switch() and no
   need for flags to mark deferred mm switching

For arm64, we'll most likely go with 2 for stable and move to 3 shortly
after, no need for other deferred mm switching.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists