linux-kernel - BUG: KASAN: slab-use-after-free in reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsNMFZHXvtN7uAmjUPewfAcgD038rMhkruQCD8kqS3cXaA@mail.gmail.com>
Date:   Sat, 21 Oct 2023 17:26:10 +0500
From:   Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>
Subject: BUG: KASAN: slab-use-after-free in reweight_entity+0x8d4

Hi,
I caught one slab-use-after-free durung release cycle 6.6 rc6 and it
looks like it will not easily reproduce again.
I tried rebooting and repeating all my actions but slab-use-after-free
did not happen again.

What we know:

1) backtrace:

[  336.215839] BUG: KASAN: slab-use-after-free in reweight_entity+0x8d4/0xa50
[  336.215846] Read of size 8 at addr ffff88812d1a80b0 by task
fossilize_repla/10552

[  336.215850] CPU: 23 PID: 10552 Comm: fossilize_repla Tainted: G
   W  OEL    -------  ---
6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64+debug #1
[  336.215853] Hardware name: Micro-Star International Co., Ltd.
MS-7D73/MPG B650I EDGE WIFI (MS-7D73), BIOS 1.61 10/18/2023
[  336.215854] Call Trace:
[  336.215856]  <TASK>
[  336.215858]  dump_stack_lvl+0x76/0xd0
[  336.215862]  print_report+0xcf/0x670
[  336.215866]  ? reweight_entity+0x8d4/0xa50
[  336.215868]  kasan_report+0xa6/0xe0
[  336.215871]  ? reweight_entity+0x8d4/0xa50
[  336.215874]  reweight_entity+0x8d4/0xa50
[  336.215877]  task_tick_fair+0x101/0xcf0
[  336.215880]  ? __pfx_task_tick_fair+0x10/0x10
[  336.215882]  ? update_rq_clock+0x2c9/0xc70
[  336.215885]  scheduler_tick+0x27c/0xaf0
[  336.215888]  update_process_times+0x157/0x1d0
[  336.215891]  ? __pfx_update_process_times+0x10/0x10
[  336.215894]  tick_sched_handle+0x67/0x130
[  336.215897]  tick_sched_timer+0xb1/0xe0
[  336.215900]  ? __pfx_tick_sched_timer+0x10/0x10
[  336.215901]  __hrtimer_run_queues+0x4d5/0x910
[  336.215905]  ? __pfx___hrtimer_run_queues+0x10/0x10
[  336.215907]  ? ktime_get_update_offsets_now+0x90/0x250
[  336.215910]  hrtimer_interrupt+0x2f5/0x810
[  336.215913]  __sysvec_apic_timer_interrupt+0xfc/0x3b0
[  336.215916]  sysvec_apic_timer_interrupt+0x42/0xc0
[  336.215919]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  336.215922] RIP: 0033:0x7fb90dad8915
[  336.215939] Code: 00 00 48 8d 73 78 ba 01 00 00 00 48 89 df 4c 89
55 c8 e8 2e 5e d9 ff 48 89 de 4c 89 ef e8 93 94 d9 ff 4c 8b 7d a0 4c
8b 55 c8 <49> 8b 47 10 48 85 c0 0f 84 26 fa ff ff 48 8b 78 08 48 85 ff
0f 84
[  336.215941] RSP: 002b:00007fb90cde2480 EFLAGS: 00000246
[  336.215943] RAX: 00007fb8ec077c50 RBX: 00007fb900906708 RCX: 00007fb9008fd3a8
[  336.215945] RDX: 00007fb8eda0c6f0 RSI: 0000000000000c91 RDI: 0000000000000c92
[  336.215946] RBP: 00007fb90cde24f0 R08: 0000000000000000 R09: 0000000000000000
[  336.215947] R10: 00007fb90cde2520 R11: 00007fb90cde25c4 R12: 00007fb90003def0
[  336.215949] R13: 00007fb90368ea50 R14: 00007fb8ed5fed20 R15: 00007fb9037b5620
[  336.215952]  </TASK>

[  336.215954] Allocated by task 2:
[  336.215955]  kasan_save_stack+0x33/0x60
[  336.215958]  kasan_set_track+0x25/0x30
[  336.215959]  __kasan_slab_alloc+0x6e/0x70
[  336.215961]  kmem_cache_alloc_node+0x18d/0x420
[  336.215962]  copy_process+0x3be/0x6910
[  336.215964]  kernel_clone+0xc8/0x710
[  336.215965]  kernel_thread+0xb4/0xf0
[  336.215967]  kthreadd+0x9c7/0xe00
[  336.215969]  ret_from_fork+0x31/0x70
[  336.215971]  ret_from_fork_asm+0x1b/0x30

[  336.215974] Freed by task 150:
[  336.215975]  kasan_save_stack+0x33/0x60
[  336.215977]  kasan_set_track+0x25/0x30
[  336.215978]  kasan_save_free_info+0x2b/0x50
[  336.215980]  __kasan_slab_free+0x10b/0x1a0
[  336.215982]  slab_free_freelist_hook+0x12b/0x1e0
[  336.215984]  kmem_cache_free+0x174/0x480
[  336.215986]  delayed_put_task_struct+0x162/0x1c0
[  336.215988]  rcu_do_batch+0x445/0x1700
[  336.215990]  rcu_core+0x880/0xdb0
[  336.215992]  __do_softirq+0x218/0x8bb

[  336.215995] Last potentially related work creation:
[  336.215996]  kasan_save_stack+0x33/0x60
[  336.215997]  __kasan_record_aux_stack+0x94/0xa0
[  336.215999]  __call_rcu_common.constprop.0+0xf8/0x1af0
[  336.216001]  __schedule+0x10b4/0x5e90
[  336.216003]  schedule_idle+0x60/0x90
[  336.216005]  do_idle+0x294/0x450
[  336.216007]  cpu_startup_entry+0x55/0x60
[  336.216008]  start_secondary+0x215/0x290
[  336.216010]  secondary_startup_64_no_verify+0x17d/0x18b

[  336.216013] The buggy address belongs to the object at ffff88812d1a8000
                which belongs to the cache task_struct of size 15640
[  336.216015] The buggy address is located 176 bytes inside of
                freed 15640-byte region [ffff88812d1a8000, ffff88812d1abd18)

[  336.216017] The buggy address belongs to the physical page:
[  336.216018] page:00000000b186a037 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x12d1a8
[  336.216021] head:00000000b186a037 order:3 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[  336.216022] memcg:ffff88814b078841
[  336.216023] flags:
0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[  336.216026] page_type: 0xffffffff()
[  336.216028] raw: 0017ffffc0000840 ffff888100052080 dead000000000122
0000000000000000
[  336.216030] raw: 0000000000000000 0000000000020002 00000001ffffffff
ffff88814b078841
[  336.216031] page dumped because: kasan: bad access detected

[  336.216033] Memory state around the buggy address:
[  336.216034]  ffff88812d1a7f80: ff ff ff ff ff ff ff ff ff ff ff ff
ff ff ff ff
[  336.216035]  ffff88812d1a8000: fa fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  336.216036] >ffff88812d1a8080: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  336.216037]                                      ^
[  336.216038]  ffff88812d1a8100: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  336.216040]  ffff88812d1a8180: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  336.216041] ==================================================================

2) Line number in code:

❯ /usr/src/kernels/6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64+debug/scripts/faddr2line
/lib/debug/lib/modules/6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64+debug/vmlinux
reweight_entity+0x8d4
reweight_entity+0x8d4/0xa50:
__update_min_deadline at kernel/sched/fair.c:805
(inlined by) min_deadline_update at kernel/sched/fair.c:819
(inlined by) min_deadline_cb_propagate at kernel/sched/fair.c:825
(inlined by) reweight_entity at kernel/sched/fair.c:3660

❯ cat -n /usr/src/debug/kernel-6.6-rc6-182-gce55c22ec8b2/linux-6.6.0-0.rc6.20231020gitce55c22ec8b2.51.fc40.x86_64/kernel/sched/fair.c
| sed -n '3650,3670 p'
  3650 */
  3651 se->vlag = div_s64(se->vlag * old_weight, weight);
  3652 } else {
  3653 s64 deadline = se->deadline - se->vruntime;
  3654 /*
  3655 * When the weight changes, the virtual time slope changes and
  3656 * we should adjust the relative virtual deadline accordingly.
  3657 */
  3658 deadline = div_s64(deadline * old_weight, weight);
  3659 se->deadline = se->vruntime + deadline;
  3660 min_deadline_cb_propagate(&se->run_node, NULL);
  3661 }
  3662
  3663 #ifdef CONFIG_SMP
  3664 do {
  3665 u32 divider = get_pelt_divider(&se->avg);
  3666
  3667 se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum, divider);
  3668 } while (0);
  3669 #endif
  3670

❯ git blame kernel/sched/fair.c -L 3650,3670
86bfbb7ce4f67 (Peter Zijlstra  2023-05-31 13:58:42 +0200 3650)           */
86bfbb7ce4f67 (Peter Zijlstra  2023-05-31 13:58:42 +0200 3651)
 se->vlag = div_s64(se->vlag * old_weight, weight);
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3652)  } else {
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3653)
 s64 deadline = se->deadline - se->vruntime;
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3654)          /*
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3655)
  * When the weight changes, the virtual time slope changes and
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3656)
  * we should adjust the relative virtual deadline accordingly.
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3657)           */
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3658)
 deadline = div_s64(deadline * old_weight, weight);
147f3efaa2418 (Peter Zijlstra  2023-05-31 13:58:44 +0200 3659)
 se->deadline = se->vruntime + deadline;
8dafa9d0eb1a1 (Peter Zijlstra  2023-10-06 21:24:45 +0200 3660)
 min_deadline_cb_propagate(&se->run_node, NULL);
86bfbb7ce4f67 (Peter Zijlstra  2023-05-31 13:58:42 +0200 3661)  }
86bfbb7ce4f67 (Peter Zijlstra  2023-05-31 13:58:42 +0200 3662)
9059393e4ec1c (Vincent Guittot 2017-05-17 11:50:45 +0200 3663) #ifdef CONFIG_SMP
1ea6c46a23f12 (Peter Zijlstra  2017-05-06 15:59:54 +0200 3664)  do {
87e867b4269f2 (Vincent Guittot 2020-06-12 17:47:03 +0200 3665)
 u32 divider = get_pelt_divider(&se->avg);
1ea6c46a23f12 (Peter Zijlstra  2017-05-06 15:59:54 +0200 3666)
1ea6c46a23f12 (Peter Zijlstra  2017-05-06 15:59:54 +0200 3667)
 se->avg.load_avg = div_u64(se_weight(se) * se->avg.load_sum,
divider);
1ea6c46a23f12 (Peter Zijlstra  2017-05-06 15:59:54 +0200 3668)  } while (0);
9059393e4ec1c (Vincent Guittot 2017-05-17 11:50:45 +0200 3669) #endif
9059393e4ec1c (Vincent Guittot 2017-05-17 11:50:45 +0200 3670)

3) Recent commit which change this line is:

❯ git show 8dafa9d0eb1a1
commit 8dafa9d0eb1a1550a0f4d462db9354161bc51e0c
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Fri Oct 6 21:24:45 2023 +0200

    sched/eevdf: Fix min_deadline heap integrity

    Marek and Biju reported instances of:

      "EEVDF scheduling fail, picking leftmost"

    which Mike correlated with cgroup scheduling and the min_deadline heap
    getting corrupted; some trace output confirms:

    > And yeah, min_deadline is hosed somehow:
    >
    >    validate_cfs_rq: --- /
    >    __print_se: ffff88845cf48080 w: 1024 ve: -58857638 lag:
870381 vd: -55861854 vmd: -66302085 E (11372/tr)
    >    __print_se:   ffff88810d165800 w: 25 ve: -80323686 lag:
22336429 vd: -41496434 vmd: -66302085 E (-1//autogroup-31)
    >    __print_se:   ffff888108379000 w: 25 ve: 0 lag: -57987257 vd:
114632828 vmd: 114632828 N (-1//autogroup-33)
    >    validate_cfs_rq: min_deadline: -55861854 avg_vruntime:
-62278313462 / 1074 = -57987256

    Turns out that reweight_entity(), which tries really hard to be fast,
    does not do the normal dequeue+update+enqueue pattern but *does* scale
    the deadline.

    However, it then fails to propagate the updated deadline value up the
    heap.

    Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like
scheduling policy")
    Reported-by: Marek Szyprowski <m.szyprowski@...sung.com>
    Reported-by: Biju Das <biju.das.jz@...renesas.com>
    Reported-by: Mike Galbraith <efault@....de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
    Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>
    Tested-by: Biju Das <biju.das.jz@...renesas.com>
    Tested-by: Mike Galbraith <efault@....de>
    Link: https://lkml.kernel.org/r/20231006192445.GE743@noisy.programming.kicks-ass.net

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ef7490c4b8b4..a4b904a010c6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3613,6 +3613,7 @@ static void reweight_entity(struct cfs_rq
*cfs_rq, struct sched_entity *se,
  */
  deadline = div_s64(deadline * old_weight, weight);
  se->deadline = se->vruntime + deadline;
+ min_deadline_cb_propagate(&se->run_node, NULL);
  }

 #ifdef CONFIG_SMP

I have no idea what else I could do to help fix this issue.

I also attach the full kernel log below.
My hardware: https://linux-hardware.org/?probe=dd5735f315
Thank you in advance.

-- 
Best Regards,
Mike Gavrilov.

Download attachment "dmesg-slab-use-after-free in reweight_entity.zip" of type "application/zip" (48394 bytes)