linux-kernel - [BUG] racy access to p->mm in membarrier_global

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez2G8ctF8dHS42TF37pThfr3y0RNOOYTmxvACm4u8Yu3cw@mail.gmail.com>
Date:   Fri, 25 Jan 2019 18:26:47 +0100
From:   Jann Horn <jannh@...gle.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     kernel list <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: [BUG] racy access to p->mm in membarrier_global_expedited()

membarrier_global_expedited() runs the following code (introduced in
commit c5f58bd58f43), protected only by an RCU read-side critical
section and the cpu_hotplug_lock:

        p = task_rcu_dereference(&cpu_rq(cpu)->curr);
        if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
                           MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
                if (!fallback)
                        __cpumask_set_cpu(cpu, tmpmask);
                else
                        smp_call_function_single(cpu, ipi_mb, NULL, 1);
        }

p->mm is not protected by either lock. This means that in theory, the
following races could occur:

1. If the compiler emitted two separate reads of ->mm, the second read
of p->mm could return a NULL pointer and crash.
2. If the mm is deallocated directly before the atomic_read() occurs,
the atomic_read() could access a freed pointer (I think?).

Neither of these are particularly likely - looking at the assembly of
a normal build, the first race doesn't exist because the compiler
optimizes the second read away, and the second race isn't going to
cause anything particularly interesting. Still, this should probably
be fixed...

As far as I can tell, you'll have to either take the task_lock()
around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
RCU to the lifetime of mm_struct. I'm not entirely sure what the
better fix is... probably task_lock() makes more sense?


To test the bug, I patched an extra delay into the code:

====================
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 3cd8a3a795d2..69cc52039576 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -14,6 +14,7 @@
  * GNU General Public License for more details.
  */
 #include "sched.h"
+#include <linux/delay.h>

 /*
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
@@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)

                rcu_read_lock();
                p = task_rcu_dereference(&cpu_rq(cpu)->curr);
-               if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
+               if (p && p->mm && (mdelay(100), 1) &&
(atomic_read(&p->mm->membarrier_state) &
                                   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
                        if (!fallback)
                                __cpumask_set_cpu(cpu, tmpmask);
====================

On a kernel with that patch applied, I ran this test code:

====================
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <stdio.h>
#include <linux/membarrier.h>
#include <err.h>

int main(void) {
  while (1) {
    printf("executing global expedited barrier...\n");
    int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
    if (res) err(1, "barrier");
  }
}
====================

That resulted in this splat:

[  212.697681] ==================================================================
[  212.700582] BUG: KASAN: null-ptr-deref in
membarrier_global_expedited+0x15f/0x220
[  212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177

[  212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
[  212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[  212.712263] Call Trace:
[  212.713177]  dump_stack+0x71/0xab
[  212.714375]  ? membarrier_global_expedited+0x15f/0x220
[  212.716236]  ? membarrier_global_expedited+0x15f/0x220
[  212.718099]  kasan_report+0x176/0x192
[  212.719445]  ? finish_task_switch+0x340/0x3d0
[  212.721057]  ? membarrier_global_expedited+0x15f/0x220
[  212.722921]  membarrier_global_expedited+0x15f/0x220
[  212.724696]  ? ipi_mb+0x10/0x10
[  212.725816]  ? vfs_write+0x120/0x230
[  212.727113]  ? __ia32_sys_read+0x50/0x50
[  212.728596]  __x64_sys_membarrier+0x85/0xf0
[  212.730056]  do_syscall_64+0x73/0x160
[  212.731428]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  212.733236] RIP: 0033:0x7fbe8747e229
[  212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
01 48
[  212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
0000000000000144
[  212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
[  212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
[  212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
[  212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
[  212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
[  212.756305] ==================================================================