linux-kernel - Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87eesmibdt.fsf@mpe.ellerman.id.au>
Date:   Fri, 17 Apr 2020 21:45:34 +1000
From:   Michael Ellerman <mpe@...erman.id.au>
To:     Steven Rostedt <rostedt@...dmis.org>, Qian Cai <cai@....pw>
Cc:     Russell Currey <ruscur@...sell.cc>,
        Nicholas Piggin <npiggin@...il.com>,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: POWER9 crash due to STRICT_KERNEL_RWX (WAS: Re: Linux-next POWER9 NULL pointer NIP...)

Steven Rostedt <rostedt@...dmis.org> writes:
> On Thu, 16 Apr 2020 21:19:10 -0400
> Qian Cai <cai@....pw> wrote:
>
>> OK, reverted the commit,
>> 
>> c55d7b5e6426 (“powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE”)
>> 
>> or set STRICT_KERNEL_RWX=n fixed the crash below and also mentioned in this thread,
>
> This may be a symptom and not a cure.

I think it is a cure.

But we still have a bug, which is that when STRICT_KERNEL_RWX is enabled
we have some sort of corruption going on.

Enabling STRICT_KERNEL_RWX changes our implementation of
patch_instruction() which is used by ftrace, so I suspect this is a
powerpc bug.

>> [  148.110969][T13115] LTP: starting chown04_16
>> [  148.255048][T13380] kernel tried to execute exec-protected page (c0000000016804ac) - exploit attempt? (uid: 0)
>> [  148.255099][T13380] BUG: Unable to handle kernel instruction fetch
>> [  148.255122][T13380] Faulting instruction address: 0xc0000000016804ac
>> [  148.255136][T13380] Oops: Kernel access of bad area, sig: 11 [#1]
>> [  148.255157][T13380] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
>> [  148.255171][T13380] Modules linked in: loop kvm_hv kvm xfs sd_mod bnx2x mdio ahci tg3 libahci libphy libata firmware_class dm_mirror dm_region_hash dm_log dm_mod
>> [  148.255213][T13380] CPU: 45 PID: 13380 Comm: chown04_16 Tainted: G        W         5.6.0+ #7
>> [  148.255236][T13380] NIP:  c0000000016804ac LR: c00800000fa60408 CTR: c0000000016804ac
>> [  148.255250][T13380] REGS: c0000010a6fafa00 TRAP: 0400   Tainted: G        W          (5.6.0+)
>> [  148.255281][T13380] MSR:  9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84000248  XER: 20040000
>> [  148.255310][T13380] CFAR: c00800000fa66534 IRQMASK: 0 
>> [  148.255310][T13380] GPR00: c000000000973268 c0000010a6fafc90 c000000001648200 0000000000000000 
>> [  148.255310][T13380] GPR04: c000000d8a22dc00 c0000010a6fafd30 00000000b5e98331 ffffffff00012c9f 
>> [  148.255310][T13380] GPR08: c000000d8a22dc00 0000000000000000 0000000000000000 c00000000163c520 
>> [  148.255310][T13380] GPR12: c0000000016804ac c000001ffffdad80 0000000000000000 0000000000000000 
>> [  148.255310][T13380] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
>> [  148.255310][T13380] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
>> [  148.255310][T13380] GPR24: 00007fff8f5e2e48 0000000000000000 c00800000fa6a488 c0000010a6fafd30 
>> [  148.255310][T13380] GPR28: 0000000000000000 000000007fffffff c00800000fa60400 c000000efd0c6780 
>> [  148.255494][T13380] NIP [c0000000016804ac] sysctl_net_busy_read+0x0/0x4
>
> The instruction pointer is on sysctl_net_busy_read? Isn't that data and
> not code?

Yes.

But we're corrupting the text, or data, somewhere, so we can jump
anywhere.

I have another trace where vhost_init() appears to call into
proc_dointvec() before crashing. vhost_init() is an empty function.

cheers