linux-kernel - Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in schedule and __might

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 23 May 2016 18:23:31 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andi Kleen <andi@...stfloor.org>, Borislav Petkov <bp@...en8.de>,
	"the arch/x86 maintainers" <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v3 3/3] sched, x86: Check that we're on the right stack in
 schedule and __might_sleep

On Sun, Feb 28, 2016 at 9:27 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Wed, Nov 19, 2014 at 11:44 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>> On Wed, Nov 19, 2014 at 11:29 AM, Andi Kleen <andi@...stfloor.org> wrote:
>>>
>>> The exception handlers which use the IST stacks don't necessarily
>>> set irq count. Maybe they should.
>>
>> Hmm. I think they should. Since they clearly must not schedule, as
>> they use a percpu stack.
>>
>> Which exceptions use IST?
>>
>> [ grep grep ]
>>
>> Looks like stack, doublefault, nmi, debug and mce. And yes, I really
>> think they should all raise the irq count if they don't already.
>> Rather than add random arch-specific "let's check that we're on the
>> right stack" code to the might-sleep stuff, just use the one we have.
>>
>
> Resurrecting an old thread:
>
> The outcome of this discussion was that ist_enter now raises
> HARDIRQ_COUNT.  I think this is causing a problem.  If a user program
> enables TF, it generates a bunch of debug exceptions.  The handlers
> raise the IRQ count and do stuff, and apparently some of that stuff
> can raise a softirq.  (I have no idea where the softirq is being
> raised.)  The softirq code notices that we're in_interrupt and doesn't
> wake ksoftirqd because it thinks we're about to exit the interrupt and
> process the softirq.  But we don't, which causes occasional warnings
> and confuses things (and me!).
>
> So how do we fix it?  If we stop raising HARDIRQ_COUNT (and apply
> $SUBJECT?), then raise_softirq will wake ksoftirqd and life is good.
> But this seems a bit silly, since, if we entered the ist exception
> handler from a context with irqs on and softirqs enabled, we *could*
> plausibly handle the softirq right away -- we're on an essentially
> empty stack.  (Of course, it's a *small* stack, since it could be the
> IST stack.)
>
> Or we could just let ksoftirqd do its thing and stop raising
> HARDIRQ_COUNT.  We could add a new preempt count field just for IST
> (yuck).  We could try to hijack a different preempt count field
> (NMI?).  But I kind of like the idea of just reinstating the original
> patch of explicitly checking that we're on a safe stack in schedule
> and __might_sleep, since that is the actual condition we care about.

Ping?  I can still trigger this fairly easily on 4.6.

--Andy