linux-kernel - LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20230914151839.3635-1-wang.yong12@zte.com.cn>
Date:   Thu, 14 Sep 2023 23:18:39 +0800
From:   Yong Wang <yongw.pur@...il.com>
To:     chrubis@...e.cz, naresh.kamboju@...aro.org
Cc:     alex.bennee@...aro.org, anders.roxell@...aro.org, arnd@...db.de,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        ltp@...ts.linux.it, mdoucha@...e.cz, peterz@...radead.org,
        vincent.guittot@...aro.org, wegao@...e.com, wang.yong12@....com.cn,
        yang.yang29@....com.cn, ran.xiaokai@....com.cn
Subject: LTP: cfs_bandwidth01: Unable to handle kernel NULL pointer dereference

Hello!
>Following kernel crash noticed on Linux stable-rc 6.5.3-rc1 on qemu-arm64 while
>running LTP sched tests cases.
>
>This is not always reproducible.
I also encountered this problem on linux 5.10 on arm64 environment.
The prompt information is as follows:
[ 2893.003795] ================================================================== 
[ 2893.003822] BUG: KASAN: null-ptr-deref in pick_next_task_fair+0x130/0x4e0 
[ 2893.003880] Read of size 8 at addr 0000000000000080 by task ksoftirqd/0/12 
[ 2893.003901]  
[ 2893.003914] CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: P           O      5.10.59-rt52#1 
[ 2893.003959] Call trace: 
[ 2893.003968]  dump_backtrace+0x0/0x2e8 
[ 2893.004009]  show_stack+0x18/0x28 
[ 2893.004032]  dump_stack+0x104/0x174 
[ 2893.004067]  kasan_report+0x1d0/0x258 
[ 2893.004098]  __asan_load8+0x94/0xd0 
[ 2893.004126]  pick_next_task_fair+0x130/0x4e0 
[ 2893.004164]  __schedule+0x220/0xbd0 
[ 2893.004192]  schedule+0xec/0x1a0 
[ 2893.004216]  smpboot_thread_fn+0x124/0x548 
[ 2893.004246]  kthread+0x24c/0x278 
[ 2893.004277]  ret_from_fork+0x10/0x34 
[ 2893.004306] ================================================================== 
[ 2893.004325] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080 
[ 2893.152267] Mem abort info: 
[ 2893.152639]   ESR = 0x96000004 
[ 2893.153045]   EC = 0x25: DABT (current EL), IL = 32 bits 
[ 2893.153739]   SET = 0, FnV = 0 
[ 2893.154143]   EA = 0, S1PTW = 0 
[ 2893.154560] Data abort info: 
[ 2893.154940]   ISV = 0, ISS = 0x00000004 
[ 2893.155443]   CM = 0, WnR = 0 
[ 2893.155838] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000188edb000 

The source code where the problem occurs corresponds to:
  se = pick_next_entity(cfs_rq, curr);		
  cfs_rq = group_cfs_rq(se); //se is NULL!

It is found that pick_next_entity returns null, so null-ptr-dere appears when accessing the members of se later.
But it is not clear under what circumstances pick_next_entity returns null.

In addition, in my environment, the following operations often recur:
  stress-ng -c 8 --cpu-load 100 --sched fifo --sched-prio 1 --cpu-method pi -t 900 &
  runltp -s cfs_bandwidth01

Hope it helps to solve the problem.
Thanks.