linux-kernel - Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141121222017.GE9198@lerouge>
Date:	Fri, 21 Nov 2014 23:20:20 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>,
	Andi Kleen <andi@...stfloor.org>,
	Josh Triplett <josh@...htriplett.org>
Subject: Re: [PATCH v4 2/5] x86, traps: Track entry into and exit from IST
 context

On Fri, Nov 21, 2014 at 02:07:04PM -0800, Paul E. McKenney wrote:
> On Fri, Nov 21, 2014 at 01:32:50PM -0800, Andy Lutomirski wrote:
> > On Fri, Nov 21, 2014 at 1:26 PM, Andy Lutomirski <luto@...capital.net> wrote:
> > > We currently pretend that IST context is like standard exception
> > > context, but this is incorrect.  IST entries from userspace are like
> > > standard exceptions except that they use per-cpu stacks, so they are
> > > atomic.  IST entries from kernel space are like NMIs from RCU's
> > > perspective -- they are not quiescent states even if they
> > > interrupted the kernel during a quiescent state.
> > >
> > > Add and use ist_enter and ist_exit to track IST context.  Even
> > > though x86_32 has no IST stacks, we track these interrupts the same
> > > way.
> > 
> > I should add:
> > 
> > I have no idea why RCU read-side critical sections are safe inside
> > __do_page_fault today.  It's guarded by exception_enter(), but that
> > doesn't do anything if context tracking is off, and context tracking
> > is usually off. What am I missing here?
> 
> Ah!  There are three cases:
> 
> 1.	Context tracking is off on a non-idle CPU.  In this case, RCU is
> 	still paying attention to CPUs running in both userspace and in
> 	the kernel.  So if a page fault happens, RCU will be set up to
> 	notice any RCU read-side critical sections.
> 
> 2.	Context tracking is on on a non-idle CPU.  In this case, RCU
> 	might well be ignoring userspace execution: NO_HZ_FULL and
> 	all that.  However, as you pointed out, in this case the
> 	context-tracking code lets RCU know that we have entered the
> 	kernel, which means that RCU will again be paying attention to
> 	RCU read-side critical sections.
> 
> 3.	The CPU is idle.  In this case, RCU is ignoring the CPU, so
> 	if we take a page fault when context tracking is off, life
> 	will be hard.  But the kernel is not supposed to take page
> 	faults in the idle loop, so this is not a problem.
> 
> Make sense?

To zoom out the picture for Andy, context tracking is never used 99%
of all workloads. It's only used for NO_HZ_FULL. RCU needs to tick
to poll on RCU uses. But when we run in userspace, RCU isn't used
and thus doesn't need the tick which we can stop. So context tracking
is there to tell RCU about CPUs crossing user/kernel boundaries.

Also these hooks account the cputime spent in userspace and kernelspace
in the absence of a tick.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/