linux-kernel - Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 28 Feb 2016 21:27:51 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>, Borislav Petkov <bp@...en8.de>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in
 schedule and __might_sleep

On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen <andi@...stfloor.org> wrote:
>>
>> The exception handlers which use the IST stacks don't necessarily
>> set irq count. Maybe they should.
>
> Hmm. I think they should. Since they clearly must not schedule, as
> they use a percpu stack.
>
> Which exceptions use IST?
>
> [ grep grep ]
>
> Looks like stack, doublefault, nmi, debug and mce. And yes, I really
> think they should all raise the irq count if they don't already.
> Rather than add random arch-specific "let's check that we're on the
> right stack" code to the might-sleep stuff, just use the one we have.
>

Resurrecting an old thread:

The outcome of this discussion was that ist_enter now raises
HARDIRQ_COUNT.  I think this is causing a problem.  If a user program
enables TF, it generates a bunch of debug exceptions.  The handlers
raise the IRQ count and do stuff, and apparently some of that stuff
can raise a softirq.  (I have no idea where the softirq is being
raised.)  The softirq code notices that we're in_interrupt and doesn't
wake ksoftirqd because it thinks we're about to exit the interrupt and
process the softirq.  But we don't, which causes occasional warnings
and confuses things (and me!).

So how do we fix it?  If we stop raising HARDIRQ_COUNT (and apply
$SUBJECT?), then raise_softirq will wake ksoftirqd and life is good.
But this seems a bit silly, since, if we entered the ist exception
handler from a context with irqs on and softirqs enabled, we *could*
plausibly handle the softirq right away -- we're on an essentially
empty stack.  (Of course, it's a *small* stack, since it could be the
IST stack.)

Or we could just let ksoftirqd do its thing and stop raising
HARDIRQ_COUNT.  We could add a new preempt count field just for IST
(yuck).  We could try to hijack a different preempt count field
(NMI?).  But I kind of like the idea of just reinstating the original
patch of explicitly checking that we're on a safe stack in schedule
and __might_sleep, since that is the actual condition we care about.

--Andy