[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200722094853.GM10769@hirez.programming.kicks-ass.net>
Date: Wed, 22 Jul 2020 11:48:53 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Ira Weiny <ira.weiny@...el.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Dan Williams <dan.j.williams@...el.com>,
Vishal Verma <vishal.l.verma@...el.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Fenghua Yu <fenghua.yu@...el.com>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-nvdimm@...ts.01.org,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across
exceptions
On Tue, Jul 21, 2020 at 10:27:09PM -0700, Ira Weiny wrote:
> I've been really digging into this today and I'm very concerned that I'm
> completely missing something WRT idtentry_enter() and idtentry_exit().
>
> I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able()
> with trace_printk()'s.
>
> With this debug code, I have found an instance where it seems like
> idtentry_enter() is called without a corresponding idtentry_exit(). This has
> left the thread ref counter at 0 which results in very bad things happening
> when __dev_access_disable() is called and the ref count goes negative.
>
> Effectively this seems to be happening:
>
> ...
> // ref == 0
> dev_access_enable() // ref += 1 ==> disable protection
> // exception (which one I don't know)
> idtentry_enter()
> // ref = 0
> _handler() // or whatever code...
> // *_exit() not called [at least there is no trace_printk() output]...
> // Regardless of trace output, the ref is left at 0
> dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection
> (Bad stuff is bound to happen now...)
> ...
>
> The ref count ends up completely messed up after this sequence... and the PKRS
> register gets out of sync as well. This is starting to make some sense of how
> I was getting what seemed like random crashes before.
>
> Unfortunately I don't understand the idt entry/exit code well enough to see
> clearly what is going on. Is there any reason idtentry_exit() is not called
> after idtentry_enter()? Perhaps some NMI/MCE or 'not normal' exception code
> path I'm not seeing? In my searches I see a corresponding exit for every
> enter. But MCE is pretty hard to follow.
>
> Also is there any chance that the process could be getting scheduled and that
> is causing an issue?
Ooh, I think I see the problem. We also use idtentry_enter() for #PF,
and #PF can schedule, exactly as you observed. Argh!
This then means you need to go frob something in
arch/x86/include/asm/idtentry.h, which is somewhat unfortunate.
Thomas, would it make sense to add a type parameter to
idtentry_{enter,exit}() ?
Powered by blists - more mailing lists