[<prev] [next>] [day] [month] [year] [list]
Message-ID: <155be8eb-0255-342f-bac8-46efb868d97c@huawei.com>
Date: Tue, 5 Jul 2022 21:07:38 +0800
From: Chen Zhongjin <chenzhongjin@...wei.com>
To: <daniel.thompson@...aro.org>
CC: <arnd@...db.de>, <arnd@...nel.org>, <linus.walleij@...aro.org>,
<linux-arch@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<linux@...linux.org.uk>, <viro@...iv.linux.org.uk>
Subject: Re: [PATCH v5 08/10] ARM: uaccess: add __{get,put}_kernel_nofault
Hi,
It seems that the problem has not been solved so far.
I found that "echo t > /proc/sysrq-trigger" causes the same fault
because "print_worker_info()" also calls "copy_from_kernel_nofault()",
but "worker->current_pwq" can be zero when copying.
Stack trace:
[ 15.303013] 8<--- cut here ---
[ 15.303315] Unhandled fault: page domain fault (0x01b) at 0x00000004
[ 15.303538] [00000004] *pgd=6338f831, *pte=00000000, *ppte=00000000
[ 15.304367] Internal error: : 1b [#1] SMP ARM
[ 15.304721] Modules linked in:
[ 15.305107] CPU: 0 PID: 89 Comm: sh Not tainted 5.19.0-rc5-dirty #332
[ 15.305373] Hardware name: ARM-Versatile Express
[ 15.305529] PC is at copy_from_kernel_nofault+0xf0/0x174
[ 15.305712] LR is at copy_from_kernel_nofault+0x30/0x174
[ 15.305873] pc : [<c0448ea4>] lr : [<c0448de4>] psr: 20000013
[ 15.306078] sp : eac4dde8 ip : 0000bff4 fp : eac4de74
[ 15.306233] r10: 00000007 r9 : 00000000 r8 : c1a09700
[ 15.306397] r7 : c1a04cc8 r6 : 00000004 r5 : eac4de18 r4 : 00000004
[ 15.306586] r3 : 00000000 r2 : c2440000 r1 : 00000004 r0 : 00000001
[ 15.306831] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment none
[ 15.307120] Control: 10c5387d Table: 633f006a DAC: 00000051
...
[ 15.318121] copy_from_kernel_nofault from print_worker_info+0xd0/0x15c
[ 15.318343] print_worker_info from sched_show_task+0x134/0x180
[ 15.318534] sched_show_task from show_state_filter+0x74/0xa8
[ 15.318714] show_state_filter from sysrq_handle_showstate+0xc/0x14
[ 15.318902] sysrq_handle_showstate from __handle_sysrq+0x88/0x138
[ 15.319173] __handle_sysrq from write_sysrq_trigger+0x4c/0x5c
[ 15.319356] write_sysrq_trigger from proc_reg_write+0xa8/0xd0
[ 15.319541] proc_reg_write from vfs_write+0xb4/0x388
[ 15.319708] vfs_write from ksys_write+0x58/0xd0
[ 15.319851] ksys_write from ret_fast_syscall+0x0/0x54
> On Thu, Jan 13, 2022 at 12:14:50PM +0100, Arnd Bergmann wrote:
> > On Thu, Jan 13, 2022 at 10:47 AM Daniel Thompson
> > <daniel.thompson@...aro.org> wrote:
> > > On Wed, Jan 12, 2022 at 06:08:17PM +0000, Russell King (Oracle)
> wrote:
> > >
> > > > The kernel attempted to access an address that is in the userspace
> > > > domain (NULL pointer) and took an exception.
> > > >
> > > > I suppose we should handle a domain fault more gracefully - what
> are
> > > > the required semantics if the kernel attempts a userspace access
> > > > using one of the _nofault() accessors?
> > >
> > > I think the best answer might well be that, if the arch provides
> > > implementations of hooks such as copy_from_kernel_nofault_allowed()
> > > then the kernel should never attempt a userspace access using the
> > > _nofault() accessors. That means they can do whatever they like!
> > >
> > > In other words something like the patch below looks like a promising
> > > approach.
> >
> > Right, it seems this is the same as on x86.
>
> Hmnn...
>
> Looking a bit deeper into copy_from_kernel_nofault() there is an odd
> asymmetry between copy_to_kernel_nofault(). Basically there is
> copy_from_kernel_nofault_allowed() but no corresponding
> copy_to_kernel_nofault_allowed() which means we cannot defend memory
> pokes using a helper function.
>
> I checked the behaviour of copy_to_kernel_nofault() on arm, arm64, mips,
> powerpc, riscv, x86 kernels (which is pretty much everything where I
> know how to fire up qemu). All except arm gracefully handle an
> attempt to write to userspace (well, NULL actually) with
> copy_to_kernel_nofault() so I think there still a few more changes
> to fully fix this.
>
> Looks like we would need a slightly more assertive change, either adding
> a copy_to_kernel_nofault_allowed() or modifying the arm dabt handlers to
> avoid faults on userspace access.
>
> Any views on which is better?
>
I've tested the copy_from_kernel_nofault_allowed() and agree that it's a
enough simple and effective solution. There is only one little gap
compared to other arch that it returns -ERANGE while actually it should
be a -EFAULT (refer to other arches).
Anyway if we want to modify the FSR handlers I guess it's also easy
because not we do nothing special for Domain Fault now.
>
> Daniel.
>
> >
> > > From f66a63b504ff582f261a506c54ceab8c0e77a98c Mon Sep 17 00:00:00
> 2001
> > > From: Daniel Thompson <daniel.thompson@...aro.org>
> > > Date: Thu, 13 Jan 2022 09:34:45 +0000
> > > Subject: [PATCH] arm: mm: Implement
> copy_from_kernel_nofault_allowed()
> > >
> > > Currently copy_from_kernel_nofault() can actually fault (due to
> software
> > > PAN) if we attempt userspace access. In any case, the documented
> > > behaviour for this function is to return -ERANGE if we attempt an
> access
> > > outside of kernel space.
> > >
> > > Implementing copy_from_kernel_nofault_allowed() solves both these
> > > problems.
> > >
> > > Signed-off-by: Daniel Thompson <daniel.thompson@...aro.org>
> >
> > Reviewed-by: Arnd Bergmann <arnd@...db.de>
Tested-by: Chen Zhongjin <chenzhongjin@...wei.com>
Best,
Chen
Powered by blists - more mailing lists