lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Dec 2019 23:20:57 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     John Stultz <john.stultz@...aro.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Quentin Perret <qperret@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Patrick Bellasi <Patrick.Bellasi@....com>,
        Ingo Molnar <mingo@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: Null pointer crash at find_idlest_group on db845c w/ linus/master

Hi John,

On 03/12/2019 19:15, John Stultz wrote:
> With today's linus/master on db845c running android, I'm able to
> fairly easily reproduce the following crash. I've not had a chance to
> bisect it yet, but I'm suspecting its connected to Vincent's recent
> rework.
> 

Thanks for reporting this.

> If you have any suggestions, or need me to test anything, please let me know.
> 
> thanks
> -john
> 
> [  136.157069] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000010
> [  136.165937] Mem abort info:
> [  136.168765]   ESR = 0x96000005
> [  136.171862]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  136.177229]   SET = 0, FnV = 0
> [  136.180320]   EA = 0, S1PTW = 0
> [  136.183502] Data abort info:
> [  136.186426]   ISV = 0, ISS = 0x00000005
> [  136.190302]   CM = 0, WnR = 0
> [  136.193316] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000175f7e000
> [  136.199814] [0000000000000010] pgd=0000000000000000, pud=0000000000000000
> [  136.206666] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  136.212295] Modules linked in:
> [  136.215397] CPU: 7 PID: 50 Comm: kauditd Tainted: G        W
>  5.4.0-mainline-11225-g9c5add21ff63 #1252
> [  136.225390] Hardware name: Thundercomm Dragonboard 845c (DT)
> [  136.231111] pstate: 60c00085 (nZCv daIf +PAN +UAO)
> [  136.235971] pc : find_idlest_group.isra.95+0x368/0x690
> [  136.241159] lr : find_idlest_group.isra.95+0x210/0x690
> [  136.246347] sp : ffffffc01036b890
> [  136.249708] x29: ffffffc01036b890 x28: ffffffe7d7450480
> [  136.255077] x27: ffffff80f81f0000 x26: 0000000000000000
> [  136.260445] x25: 0000000000000000 x24: ffffffc01036b998
> [  136.265810] x23: ffffff80f8e40a00 x22: ffffffe7d7719e30
> [  136.271175] x21: ffffff80f8f16520 x20: ffffff80f8f16180
> [  136.276539] x19: ffffffe7d771a610 x18: ffffffe7d71d1ef0
> [  136.281908] x17: 0000000000000000 x16: 0000000000000000
> [  136.287274] x15: 0000000000000001 x14: 6f3a753d74786574
> [  136.292644] x13: 6e6f637420383637 x12: 632c323135633a30
> [  136.298007] x11: 0000000000000070 x10: 0000000000000002
> [  136.303371] x9 : 0000000000000000 x8 : 0000000000000075
> [  136.308737] x7 : ffffff80f8f16000 x6 : ffffffe7d7450000
> [  136.314099] x5 : 0000000000000004 x4 : 0000000000000000
> [  136.319465] x3 : 000000000000025c x2 : ffffff80f8f16780
> [  136.324836] x1 : 0000000000000000 x0 : 0000000000000002
> [  136.330198] Call trace:
> [  136.332680]  find_idlest_group.isra.95+0x368/0x690
> [  136.337528]  select_task_rq_fair+0x4e4/0xd88
> [  136.341848]  try_to_wake_up+0x21c/0x7f8
> [  136.345735]  default_wake_function+0x34/0x48
> [  136.350053]  pollwake+0x98/0xc8
> [  136.353233]  __wake_up_common+0x90/0x158
> [  136.357202]  __wake_up_common_lock+0x88/0xd0
> [  136.361519]  __wake_up_sync_key+0x40/0x50
> [  136.365584]  sock_def_readable+0x44/0x78
> [  136.369560]  __netlink_sendskb+0x44/0x58
> [  136.373525]  netlink_unicast+0x220/0x258
> [  136.377496]  kauditd_send_queue+0xa4/0x158
> [  136.381643]  kauditd_thread+0xf4/0x248
> [  136.385438]  kthread+0x130/0x138
> [  136.388706]  ret_from_fork+0x10/0x1c
> [  136.392332] Code: 54001340 7100081f 540016e1 a9478ba1 (f9400821)
> [  136.398493] ---[ end trace 47d254973801b2c4 ]---
> [  136.403162] Kernel panic - not syncing: Fatal exception
> [  136.408445] SMP: stopping secondary CPUs
> [  136.412440] Kernel Offset: 0x27c6200000 from 0xffffffc010000000
> [  136.418417] PHYS_OFFSET: 0xffffffe140000000
> [  136.422655] CPU features: 0x00002,2a80a218
> 

I fed this to decode_stacktrace.sh using your branch, I get:

[  136.157069] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000010
[  136.165937] Mem abort info:
[  136.168765]   ESR = 0x96000005
[  136.171862]   EC = 0x25: DABT (current EL), IL = 32 bits
[  136.177229]   SET = 0, FnV = 0
[  136.180320]   EA = 0, S1PTW = 0
[  136.183502] Data abort info:
[  136.186426]   ISV = 0, ISS = 0x00000005
[  136.190302]   CM = 0, WnR = 0
[  136.193316] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000175f7e000
[  136.199814] [0000000000000010] pgd=0000000000000000, pud=0000000000000000
[  136.206666] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  136.212295] Modules linked in:
[  136.215397] CPU: 7 PID: 50 Comm: kauditd Tainted: G        W
5.4.0-mainline-11225-g9c5add21ff63 #1252
[  136.225390] Hardware name: Thundercomm Dragonboard 845c (DT)
[  136.231111] pstate: 60c00085 (nZCv daIf +PAN +UAO)
[  136.235971] pc : find_idlest_group.isra.95+0x368/0x690
[  136.241159] lr : find_idlest_group.isra.95+0x210/0x690
[  136.246347] sp : ffffffc01036b890
[  136.249708] x29: ffffffc01036b890 x28: ffffffe7d7450480
[  136.255077] x27: ffffff80f81f0000 x26: 0000000000000000
[  136.260445] x25: 0000000000000000 x24: ffffffc01036b998
[  136.265810] x23: ffffff80f8e40a00 x22: ffffffe7d7719e30
[  136.271175] x21: ffffff80f8f16520 x20: ffffff80f8f16180
[  136.276539] x19: ffffffe7d771a610 x18: ffffffe7d71d1ef0
[  136.281908] x17: 0000000000000000 x16: 0000000000000000
[  136.287274] x15: 0000000000000001 x14: 6f3a753d74786574
[  136.292644] x13: 6e6f637420383637 x12: 632c323135633a30
[  136.298007] x11: 0000000000000070 x10: 0000000000000002
[  136.303371] x9 : 0000000000000000 x8 : 0000000000000075
[  136.308737] x7 : ffffff80f8f16000 x6 : ffffffe7d7450000
[  136.314099] x5 : 0000000000000004 x4 : 0000000000000000
[  136.319465] x3 : 000000000000025c x2 : ffffff80f8f16780
[  136.324836] x1 : 0000000000000000 x0 : 0000000000000002
[  136.330198] Call trace:
[  136.332680] find_idlest_group.isra.95+0x368/0x690
[  136.337528] select_task_rq_fair (kernel/sched/fair.c:5663 kernel/sched/fair.c:6387)
[  136.341848] try_to_wake_up (kernel/sched/core.c:2397 kernel/sched/core.c:2644)
[  136.345735] default_wake_function (kernel/sched/core.c:4350)
[  136.350053] pollwake (fs/select.c:218)
[  136.353233] __wake_up_common (kernel/sched/wait.c:93)
[  136.357202] __wake_up_common_lock (./include/linux/spinlock.h:393 (discriminator 1) kernel/sched/wait.c:125 (discriminator 1))
[  136.361519] __wake_up_sync_key (kernel/sched/wait.c:191)
[  136.365584] sock_def_readable (./include/asm-generic/bitops/non-atomic.h:106 ./include/net/sock.h:837 ./include/net/sock.h:2208 net/core/sock.c:2798)
[  136.369560] __netlink_sendskb (net/netlink/af_netlink.c:1251)
[  136.373525] netlink_unicast (./include/linux/refcount.h:255 ./include/linux/refcount.h:281 ./include/net/sock.h:1728 net/netlink/af_netlink.c:1257 net/netlink/af_netlink.c:1343)
[  136.377496] kauditd_send_queue (kernel/audit.c:734)
[  136.381643] kauditd_thread (kernel/audit.c:858 (discriminator 4))
[  136.385438] kthread (kernel/kthread.c:684)
[  136.388706] ret_from_fork (arch/arm64/kernel/entry.S:909)
[ 136.392332] Code: 54001340 7100081f 540016e1 a9478ba1 (f9400821)
All code
========
   0:	54001340	b.eq	0x268
   4:	7100081f	cmp	w0, #0x2
   8:	540016e1	b.ne	0x2e4
   c:	a9478ba1	ldp	x1, x2, [x29,#120]
  10:*	f9400821	ldr	x1, [x1,#16]		<-- trapping instruction

Code starting with the faulting instruction
===========================================
   0:	f9400821	ldr	x1, [x1,#16]
[  136.398493] ---[ end trace 47d254973801b2c4 ]---
[  136.403162] Kernel panic - not syncing: Fatal exception
[  136.408445] SMP: stopping secondary CPUs
[  136.412440] Kernel Offset: 0x27c6200000 from 0xffffffc010000000
[  136.418417] PHYS_OFFSET: 0xffffffe140000000

The ISRA makes this a bit annoying to figure out the faulty line.

On a sidenote, I never really looked at what fipa-sra does, so I scrounged
this from GCC:

"""
IPA-SRA is an interprocedural pass that removes unused function return
values (turning functions returning a value which is never used into void
functions), removes unused function parameters.  It can also replace an
aggregate parameter by a set of other parameters representing part of the
original, turning those passed by reference into new ones which pass the
value directly.
"""

I suspect this is GCC getting rid of the 'sd_flag' parameter of
find_idlest_group() which is now unused. 


Back to the thing, the interesting bit is 

   0:	54001340	b.eq	0x268
   4:	7100081f	cmp	w0, #0x2
   8:	540016e1	b.ne	0x2e4
   c:	a9478ba1	ldp	x1, x2, [x29,#120]
  10:*	f9400821	ldr	x1, [x1,#16]		<-- trapping instruction

The misfit task cases seem to match the pattern of the last two lines. Here's
one:

        case group_misfit_task:                                                                                                                                                                             
                /* Select group with the highest max capacity */                                                                                                                                            
                if (local->sgc->max_capacity >= idlest->sgc->max_capacity)                                                                                                                                  
    1754:       a94783a2        ldp     x2, x0, [x29,#120]                                                                                                                                                  
    1758:       f9400801        ldr     x1, [x0,#16]  


Looking at the code, I think I got it. In find_idlest_group() we do
initialize 'idlest_sgs' (just like busiest_stat in LB) but 'idlest' is just
NULL. The latter is dereferenced in update_pick_idlest() just for the misfit
case, which goes boom. And I reviewed the damn thing... Bleh.

Fixup looks easy enough, lemme write one up.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ