lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRtPycnjT5kc-YkYo2Tj3+Jt1Cog7OPxg4bG1mbXDp+RA@mail.gmail.com>
Date:   Tue, 25 Apr 2023 20:42:08 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Christophe Leroy <christophe.leroy@...roup.eu>
Cc:     Zhouyi Zhou <zhouzhouyi@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Segher Boessenkool <segher@...nel.crashing.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        rcu <rcu@...r.kernel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        "lance@...osl.org" <lance@...osl.org>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: BUG : PowerPC RCU: torture test failed with __stack_chk_fail

On Tue, Apr 25, 2023 at 9:40 AM Christophe Leroy
<christophe.leroy@...roup.eu> wrote:
>
>
>
> Le 25/04/2023 à 13:06, Joel Fernandes a écrit :
> > On Tue, Apr 25, 2023 at 6:58 AM Zhouyi Zhou <zhouzhouyi@...il.com> wrote:
> >>
> >> hi
> >>
> >> On Tue, Apr 25, 2023 at 6:13 PM Peter Zijlstra <peterz@...radead.org> wrote:
> >>>
> >>> On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote:
> >>>> This is amazing debugging Boqun, like a boss! One comment below:
> >>>>
> >>>>>>> Or something simple I haven't thought of? :)
> >>>>>>
> >>>>>> At what points can r13 change?  Only when some particular functions are
> >>>>>> called?
> >>>>>>
> >>>>>
> >>>>> r13 is the local paca:
> >>>>>
> >>>>>          register struct paca_struct *local_paca asm("r13");
> >>>>>
> >>>>> , which is a pointer to percpu data.
> >>>>>
> >>>>> So if a task schedule from one CPU to anotehr CPU, the value gets
> >>>>> changed.
> >>>>
> >>>> It appears the whole issue, per your analysis, is that the stack
> >>>> checking code in gcc should not cache or alias r13, and must read its
> >>>> most up-to-date value during stack checking, as its value may have
> >>>> changed during a migration to a new CPU.
> >>>>
> >>>> Did I get that right?
> >>>>
> >>>> IMO, even without a reproducer, gcc on PPC should just not do that,
> >>>> that feels terribly broken for the kernel. I wonder what clang does,
> >>>> I'll go poke around with compilerexplorer after lunch.
> >>>>
> >>>> Adding +Peter Zijlstra as well to join the party as I have a feeling
> >>>> he'll be interested. ;-)
> >>>
> >>> I'm a little confused; the way I understand the whole stack protector
> >>> thing to work is that we push a canary on the stack at call and on
> >>> return check it is still valid. Since in general tasks randomly migrate,
> >>> the per-cpu validation canary should be the same on all CPUs.
> >>>
> >>> Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use
> >>> raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu,
> >>> but no guarantees.
> >>>
> >>> Both cases use r13 (paca) in a racy manner, and in both cases it should
> >>> be safe.
> >> New test results today: both gcc build from git (git clone
> >> git://gcc.gnu.org/git/gcc.git) and Ubuntu 22.04 gcc-12.1.0
> >> are immune from the above issue. We can see the assembly code on
> >> http://140.211.169.189/0425/srcu_gp_start_if_needed-gcc-12.txt
> >>
> >> while
> >> Both native gcc on PPC vm (gcc version 9.4.0), and gcc cross compiler
> >> on my x86 laptop (gcc version 10.4.0) will reproduce the bug.
> >
> > Do you know what fixes the issue? I would not declare victory yet. My
> > feeling is something changes in timing, or compiler codegen which
> > hides the issue. So the issue is still there but it is just a matter
> > of time before someone else reports it.
> >
> > Out of curiosity for PPC folks, why cannot 64-bit PPC use per-task
> > canary? Michael, is this an optimization? Adding Christophe as well
> > since it came in a few years ago via the following commit:
>
> It uses per-task canary. But unlike PPC32, PPC64 doesn't have a fixed
> register pointing to 'current' at all time so the canary is copied into
> a per-cpu struct during _switch().
>
> If GCC keeps an old value of the per-cpu struct pointer, it then gets
> the canary from the wrong CPU struct so from a different task.

Thanks a lot Christophe, that makes sense. Segher, are you convinced
that it is a compiler issue or is there still some doubt?  Could you
modify gcc's stack checker to not optimize away r13 reads or is that
already the case in newer gcc?

thanks,

 - Joel

>
> Christophe
>
> >
> > commit 06ec27aea9fc84d9c6d879eb64b5bcf28a8a1eb7
> > Author: Christophe Leroy <christophe.leroy@....fr>
> > Date:   Thu Sep 27 07:05:55 2018 +0000
> >
> >      powerpc/64: add stack protector support
> >
> >      On PPC64, as register r13 points to the paca_struct at all time,
> >      this patch adds a copy of the canary there, which is copied at
> >      task_switch.
> >      That new canary is then used by using the following GCC options:
> >      -mstack-protector-guard=tls
> >      -mstack-protector-guard-reg=r13
> >      -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
> >
> >      Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
> >      Signed-off-by: Michael Ellerman <mpe@...erman.id.au>
> >
> >   - Joel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ