[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRFZ3zDc0gJRHjJPRuNaBtnmUc+9RxSAHH48jkFw_b34g@mail.gmail.com>
Date: Tue, 25 Apr 2023 06:59:29 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Boqun Feng <boqun.feng@...il.com>,
Segher Boessenkool <segher@...nel.crashing.org>,
Michael Ellerman <mpe@...erman.id.au>,
Zhouyi Zhou <zhouzhouyi@...il.com>,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
rcu <rcu@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>, lance@...osl.org,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail
On Tue, Apr 25, 2023 at 6:13 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote:
> > This is amazing debugging Boqun, like a boss! One comment below:
> >
> > > > > Or something simple I haven't thought of? :)
> > > >
> > > > At what points can r13 change? Only when some particular functions are
> > > > called?
> > > >
> > >
> > > r13 is the local paca:
> > >
> > > register struct paca_struct *local_paca asm("r13");
> > >
> > > , which is a pointer to percpu data.
> > >
> > > So if a task schedule from one CPU to anotehr CPU, the value gets
> > > changed.
> >
> > It appears the whole issue, per your analysis, is that the stack
> > checking code in gcc should not cache or alias r13, and must read its
> > most up-to-date value during stack checking, as its value may have
> > changed during a migration to a new CPU.
> >
> > Did I get that right?
> >
> > IMO, even without a reproducer, gcc on PPC should just not do that,
> > that feels terribly broken for the kernel. I wonder what clang does,
> > I'll go poke around with compilerexplorer after lunch.
> >
> > Adding +Peter Zijlstra as well to join the party as I have a feeling
> > he'll be interested. ;-)
>
> I'm a little confused; the way I understand the whole stack protector
> thing to work is that we push a canary on the stack at call and on
> return check it is still valid. Since in general tasks randomly migrate,
> the per-cpu validation canary should be the same on all CPUs.
>
> Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use
> raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu,
> but no guarantees.
>
> Both cases use r13 (paca) in a racy manner, and in both cases it should
> be safe.
AFAICS, the canary is randomly chosen both in the kernel [1]. This
also appears to be the case in glibc. That makes sense because you
don't want the canary to be something that the attacker can easily
predict and store on the stack to bypass buffer overflow attacks:
[1] kernel :
/*
* Initialize the stackprotector canary value.
*
* NOTE: this must only be called from functions that never return,
* and it must always be inlined.
*/
static __always_inline void boot_init_stack_canary(void)
{
unsigned long canary = get_random_canary();
current->stack_canary = canary;
#ifdef CONFIG_PPC64
get_paca()->canary = canary;
#endif
}
thanks,
- Joel
Powered by blists - more mailing lists