linux-kernel - Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87blnqib81.fsf@mpe.ellerman.id.au>
Date:   Fri, 17 Apr 2020 21:49:02 +1000
From:   Michael Ellerman <mpe@...erman.id.au>
To:     "Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
        Qian Cai <cai@....pw>, Russell Currey <ruscur@...sell.cc>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        Nicholas Piggin <npiggin@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

"Naveen N. Rao" <naveen.n.rao@...ux.ibm.com> writes:
> Hi Qian,
>
> Qian Cai wrote:
>> OK, reverted the commit,
>> 
>> c55d7b5e6426 (“powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE”)
>> 
>> or set STRICT_KERNEL_RWX=n fixed the crash below and also mentioned in this thread,
>> 
>> https://lore.kernel.org/lkml/15AC5B0E-A221-4B8C-9039-FA96B8EF7C88@lca.pw/
>
> Do you see any errors logged in dmesg when you see the crash?  
> STRICT_KERNEL_RWX changes how patch_instruction() works, so it would be 
> interesting to see if there are any ftrace-related errors thrown before 
> the crash.

I've been able to reproduce with STRICT_KERNEL_RWX=y and concurrently
running:

# while true; do echo function > /sys/kernel/debug/tracing/current_tracer ; echo nop > /sys/kernel/debug/tracing/current_tracer ; done

and:

# while true; do find /lib/modules/$(uname -r) -name '*.ko' -printf "%f\n" | sed -e "s/\.ko//" | xargs -i modprobe -va {}; lsmod | awk '{print $1}' | xargs -i modprobe -vr {}; done

ie. stressing module loading/unloading and ftrace at the same time.


It's not 100% but it usually reproduces within 10-20 minutes.

It looks like sometimes our __patch_instruction() fails, and then that
somehow leads to things getting further messed up. Presumably we have
some bad error handling somewhere.

cheers