lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2bf9a1c1950941aabc383fd196e5768a@BJMBX01.spreadtrum.com>
Date:   Fri, 8 Nov 2019 02:16:18 +0000
From:   黄吕强 (Lvqiang Huang) 
        <lvqiang.huang@...soc.com>
To:     Russell King - ARM Linux admin <linux@...linux.org.uk>
CC:     "ebiederm@...ssion.com" <ebiederm@...ssion.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "anshuman.khandual@....com" <anshuman.khandual@....com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "will@...nel.org" <will@...nel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "'26332949@...com'" <26332949@...com>,
        楚恩来 (Enlai Chu) <enlai.chu@...soc.com>
Subject: RE: [PATCH] ARM: check __ex_table in do_bad()

Sorry for not having described it clearly, please let me add some more information. 

The kernel log for the scenario
[20461.271374] sysrq: SysRq : Show Blocked State
[20461.271405]   task                PC stack   pid father
[20461.271436] mbox-send-threa D c08cfad8     0    38      2 0x00000000
/*and ignore some logs abort the backtrace dump of some TASK_UNINTERRUPTIBLE tasks */
[20461.273387] fsck.exfat      D c08cfad8     0  6221   2276 0x00000000
[20461.273408] Backtrace:
[20461.273430] [<c08cf5d0>] (__schedule) from [<c08cff84>] (schedule+0x90/0xa8)
[20461.273442]  r10:ce009ef0 r9:ce009df4 r8:c0d0790c r7:00000082 r6:7fffffff r5:00000000
[20461.273477]  r4:ce008000
[20461.273497] [<c08cfef4>] (schedule) from [<c08d2b90>] (schedule_timeout+0x2c/0x26c)
[20461.273509]  r4:7fffffff r3:dc8ba693
[20461.273561] Unhandled fault: page domain fault (0x01b) at 0x32848c02
[20461.273576] pgd = d1854000
[20461.273587] [32848c02] *pgd=bb21e835
[20461.273607] Internal error: : 1b [#1] PREEMPT SMP ARM
[20461.278903] CPU: 2 PID: 5917 Comm: watchdog Tainted: G        W  O    4.4.147+ #1
[20461.278929] task: e9beecc0 task.stack: e30a4000
[20461.278949] PC is at for_each_frame+0x18/0x88
[20461.278965] LR is at vprintk_emit+0x470/0x4ec

The Task A: finally crashed task, PID: 5917 Comm: watchdog, running on CPU 2, dumping backtrace of all UN tasks.
The Task B: TASK_UNINTERRUPTIBLE to TASK_RUNNING when Task A is trying to dump its backtrace. 

The first 2 frames dump for task B are ok, see 
[20461.273430] [<c08cf5d0>] (__schedule) from [<c08cff84>] (schedule+0x90/0xa8)
[20461.273497] [<c08cfef4>] (schedule) from [<c08d2b90>] (schedule_timeout+0x2c/0x26c)

Then task A crashed:
[20461.273561] Unhandled fault: page domain fault (0x01b) at 0x32848c02

From the RAM dump after kernel crash, we can see Task B had been scheduled to running on CPU 0. 
crash_arm> ps 6221
   PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
>  6221   2276   0  cde04880  RU   0.4   17784  13596  fsck.exfat

And the backtrace should changed, which cause the crash of Task A.
crash_arm> bt 6221
PID: 6221   TASK: cde04880  CPU: 0   COMMAND: "fsck.exfat"
 #0 [<c0117a5c>] (__kunmap_atomic) from [<c0413ae8>]
 #1 [<c0413894>] (copy_page_to_iter) from [<c01f4788>]
 #2 [<c01f439c>] (generic_file_read_iter) from [<c02725e8>]
 #3 [<c027257c>] (blkdev_read_iter) from [<c023b5b0>]
 #4 [<c023b4f8>] (__vfs_read) from [<c023bd04>]
 #5 [<c023bc78>] (vfs_read) from [<c023c7e0>]
 #6 [<c023c76c>] (sys_pread64) from [<c01079a0>]

This is the race condition, try to backtrace another task is not safe. We can't assume the task won't be scheduled to execution during the backtrace dump. The stack frame should totally change once execute again. 

The __ex_table entry in @for_each_frame should adding for this scenario. But with CONFIG_CPU_SW_DOMAIN_PAN=y, page domain fault may hit and go the do_bad() instead of do_page_fault().

The path may not an optimal solution, I just want to point out the problem, and is there any concern if we check __ex_table in do_bad()? 

Now, our project had enabled CONFIG_ARM_UNWIND=y, it will fail to get an unwind_idx when get a wrong sv_pc, then the unwind abort without kernel crash.

-----Original Message-----
From: 黄吕强 (Lvqiang Huang) 
Sent: Friday, November 08, 2019 1:23 AM
To: Russell King - ARM Linux admin
Cc: ebiederm@...ssion.com; dave.hansen@...ux.intel.com; anshuman.khandual@....com; akpm@...ux-foundation.org; f.fainelli@...il.com; will@...nel.org; tglx@...utronix.de; linux-arm-kernel@...ts.infradead.org; linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ARM: check __ex_table in do_bad()


> 在 2019年11月7日,17:24,Russell King - ARM Linux admin 
> <linux@...linux.org.uk> 写道:
> 
>> On Thu, Nov 07, 2019 at 03:45:13PM +0800, Lvqiang wrote:
>> 
>> We got many crashs in for_each_frame+0x18 arch/arm/lib/backtrace.S
>>    1003: ldr r2, [sv_pc, #-4]
>> 
>> The backtrace is
>>    dump_backtrace
>>    show_stack
>>    sched_show_task
>>    show_state_filter
>>    sysrq_handle_showstate_blocked
>>    __handle_sysrq
>>    write_sysrq_trigger
>>    proc_reg_write
>>    __vfs_write
>>    vfs_write
>>    sys_write
>> 
>> Related Kernel config
>>    CONFIG_CPU_SW_DOMAIN_PAN=y
>>    # CONFIG_ARM_UNWIND is not set
>>    CONFIG_FRAME_POINTER=y
>> 
>> The task A was dumping the stack of an UN task B. However, the task B
> 
> What is "an UN task B"?

UN means TASK_UNINTERRUPTIBLE. 
(Sorry for the typo in the last reply)

>> scheduled to run on another CPU, which cause it stack content changed.
>> Then, task A may hit a page domain fault and die().
>>    [520.661314] Unhandled fault: page domain fault (0x01b) at 
>> 0x32848c02
> 
> So, the backtrace code is trying to access userspace.  It isn't 
> supposed to be accessing userspace - there are no guarantees that 
> userspace will be using frame pointers.  That is the bug.
> 

There is a race condition when try to get the backtrace of another task,whose frames may totally changed during the execution. 

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 
> 622kbps up According to speedtest.net: 11.9Mbps down 500kbps up


============================================================================
This email (including its attachments) is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential or otherwise protected from disclosure. Unauthorized use, dissemination, distribution or copying of this email or the information herein or taking any action in reliance on the contents of this email or the information herein, by anyone other than the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, is strictly prohibited. If you are not the intended recipient, please do not read, copy, use or disclose any part of this e-mail to others. Please notify the sender immediately and permanently delete this e-mail and any attachments if you received it in error. Internet communications cannot be guaranteed to be timely, secure, error-free or virus-free. The sender does not accept liability for any errors or omissions. 
本邮件及其附件具有保密性质,受法律保护不得泄露,仅发送给本邮件所指特定收件人。严禁非经授权使用、宣传、发布或复制本邮件或其内容。若非该特定收件人,请勿阅读、复制、 使用或披露本邮件的任何内容。若误收本邮件,请从系统中永久性删除本邮件及所有附件,并以回复邮件的方式即刻告知发件人。无法保证互联网通信及时、安全、无误或防毒。发件人对任何错漏均不承担责任。

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ