[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMe9rOo9ivMP4b4qYGWvtEzKT-h68iPecjqp3wZzQCpVqE9FsA@mail.gmail.com>
Date: Sat, 18 Jul 2020 16:04:46 -0700
From: "H.J. Lu" <hjl.tools@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Yu-cheng Yu <yu-cheng.yu@...el.com>,
Andy Lutomirski <luto@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ingo Molnar <mingo@...hat.com>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Tony Luck <tony.luck@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Weijiang Yang <weijiang.yang@...el.com>
Subject: Re: Random shadow stack pointer corruption
On Sat, Jul 18, 2020 at 3:41 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 7/18/20 11:24 AM, Yu-cheng Yu wrote:
> > On Sat, 2020-07-18 at 11:00 -0700, Andy Lutomirski wrote:
> >> On Sat, Jul 18, 2020 at 10:58 AM Yu-cheng Yu <yu-cheng.yu@...el.com> wrote:
> >>> Hi,
> >>>
> >>> My shadow stack tests start to have random shadow stack pointer corruption after
> >>> v5.7 (excluding). The symptom looks like some locking issue or the kernel is
> >>> confused about which CPU a task is on. In later tip/master, this can be
> >>> triggered by creating two tasks and each does continuous
> >>> pthread_create()/pthread_join(). If the kernel has max_cpus=1, the issue goes
> >>> away. I also checked XSAVES/XRSTORS, but this does not seem to be an issue
> >>> coming from there.
> >>
> >> What do you mean "shadow stack pointer corruption"? Is SSP itself
> >> corrupt while running in the kernel? Is one of the MSRs getting
> >> corrupted? Is the memory to which the shadow stack points getting
> >> corrupted? Is the CPU rejecting an attempt to change SSP?
> >
> > What I see is, a new thread after ret_from_fork() and iret back to ring-3,
> > its shadow stack pointer (MSR_IA32_PL3_SSP) is corrupted.
>
> Does corrupt mean random? Or is it a valid stack address, just not for
> _this_ thread? Or NULL? Or is it a kernel address? Have you tried
> tracing *ALL* the WRMSR's and XRSTOR's that write to the MSR?
Another data point. When memory corruption happened, there was no
core dump at all. We verified that core dump was enabled and we did
get core dump for other programs.
--
H.J.
Powered by blists - more mailing lists