linux-kernel - Re: TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split syscall_trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140731003034.GA32078@localhost.localdomain>
Date:	Thu, 31 Jul 2014 02:30:37 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org, Kees Cook <keescook@...omium.org>,
	Will Drewry <wad@...omium.org>, x86@...nel.org,
	linux-arm-kernel@...ts.infradead.org, linux-mips@...ux-mips.org,
	linux-arch@...r.kernel.org, linux-security-module@...r.kernel.org,
	Alexei Starovoitov <ast@...mgrid.com>, hpa@...or.com
Subject: Re: TIF_NOHZ can escape nonhz mask? (Was: [PATCH v3 6/8] x86: Split
 syscall_trace_enter into two phases)

On Wed, Jul 30, 2014 at 07:46:30PM +0200, Oleg Nesterov wrote:
> On 07/30, Frederic Weisbecker wrote:
> >
> > On Tue, Jul 29, 2014 at 07:54:14PM +0200, Oleg Nesterov wrote:
> >
> > >
> > > Looks like, we can kill context_tracking_task_switch() and simply change the
> > > "__init" callers of context_tracking_cpu_set() to do set_thread_flag(TIF_NOHZ) ?
> > > Then this flag will be propagated by copy_process().
> >
> > Right, that would be much better. Good catch! context tracking is enabled from
> > tick_nohz_init(). This is the init 0 task so the flag should be propagated from there.
> 
> actually init 1 task, but this doesn't matter.

Are you sure? It does matter because that would invalidate everything I understood
about init/main.c :) I was convinced that the very first kernel init task is PID 0 then
it forks on rest_init() to launch the userspace init with PID 1. Then init/0 becomes the
idle task of the boot CPU.

> 
> > I still think we need a for_each_process_thread() set as well though because some
> > kernel threads may well have been created at this stage already.
> 
> Yes... Or we can add set_thread_flag(TIF_NOHZ) into ____call_usermodehelper().

Couldn't there be some other tasks than usermodehelper stuffs at this stage? Like workqueues
or random kernel threads?

> 
> > > Or I am totally confused? (quite possible).
> > >
> > > > So here is a scenario where this is a problem: a task runs on CPU 0, passes the context
> > > > tracking call before returning from a syscall to userspace, and gets an interrupt. The
> > > > interrupt preempts the task and it moves to CPU 1. So it returns from preempt_schedule_irq()
> > > > after which it is going to resume to userspace.
> > > >
> > > > In this scenario, if context tracking is only enabled on CPU 1, we have no way to know that
> > > > the task is resuming to userspace, because we passed through the context tracking probe
> > > > already and it was ignored on CPU 0.
> > >
> > > Thanks. But I still can't understand... So if we only track CPU 1, then in this
> > > case context_tracking.state == IN_USER on CPU 0, but it can be IN_USER or IN_KERNEL
> > > on CPU 1.
> >
> > I'm not sure I understand your question.
> 
> Probably because it was stupid. Seriously, I still have no idea what this code
> actually does.
> 
> > Context tracking is either enabled everywhere or
> > nowhere.
> >
> > I need to say though that there is a per CPU context tracking state named context_tracking.active.
> > It's confusing because it suggests that context tracking is active per CPU. Actually it's tracked
> > everywhere when globally enabled, but active determines if we call the RCU and vtime callbacks or
> > not.
> >
> > So only nohz full CPUs have context_tracking.active set because only these need to call the RCU
> > and vtime callbacks. Other CPUs still do the context tracking but they won't call rcu and vtime
> > functions.
> 
> I meant that in the scenario you described above the "global" TIF_NOHZ doesn't
> really make a difference, afaics.
> 
> Lets assume that context tracking is only enabled on CPU 1. To simplify,
> assume that we have a single usermode task T which sleeps in kernel mode.
> 
> So context_tracking[0].state == context_tracking[1].state == IN_KERNEL.
> 
> T wakes up on CPU_0, returns to user space, calls user_enter(). This sets
> context_tracking[0].state = IN_USER but otherwise does nothing else, this
> CPU is not tracked and .active is false.
> 
> Right after local_irq_restore() this task can migrate to CPU_1 and finish
> its ret-to-usermode path. But since it had already passed user_enter() we
> do not change context_tracking[1].state and do not play with rcu/vtime.
> (unless this task hits SCHEDULE_USER in asm).
> 
> The same for user_exit() of course.

So indeed if context tracking is enabled on CPU 1 and not in CPU 0, we risk
such situation where CPU 1 has wrong context tracking.

But global TIF_NOHZ should enforce context tracking everywhere. And also it's
less context switch overhead.

> 
> Oleg.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/