linux-kernel - Re: [BUG] racy access to p->mm in membarrier_global

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20190128162719.GT4240@linux.ibm.com>
Date:   Mon, 28 Jan 2019 08:27:19 -0800
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Jann Horn <jannh@...gle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [BUG] racy access to p->mm in membarrier_global_expedited()

On Mon, Jan 28, 2019 at 11:10:00AM -0500, Mathieu Desnoyers wrote:
> ----- On Jan 28, 2019, at 9:15 AM, paulmck paulmck@...ux.ibm.com wrote:
> 
> > On Fri, Jan 25, 2019 at 06:26:47PM +0100, Jann Horn wrote:
> >> membarrier_global_expedited() runs the following code (introduced in
> >> commit c5f58bd58f43), protected only by an RCU read-side critical
> >> section and the cpu_hotplug_lock:
> >> 
> >>         p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> >>         if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
> >>                            MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
> >>                 if (!fallback)
> >>                         __cpumask_set_cpu(cpu, tmpmask);
> >>                 else
> >>                         smp_call_function_single(cpu, ipi_mb, NULL, 1);
> >>         }
> >> 
> >> p->mm is not protected by either lock. This means that in theory, the
> >> following races could occur:
> >> 
> >> 1. If the compiler emitted two separate reads of ->mm, the second read
> >> of p->mm could return a NULL pointer and crash.
> >> 2. If the mm is deallocated directly before the atomic_read() occurs,
> >> the atomic_read() could access a freed pointer (I think?).
> >> 
> >> Neither of these are particularly likely - looking at the assembly of
> >> a normal build, the first race doesn't exist because the compiler
> >> optimizes the second read away, and the second race isn't going to
> >> cause anything particularly interesting. Still, this should probably
> >> be fixed...
> >> 
> >> As far as I can tell, you'll have to either take the task_lock()
> >> around the "p->mm && (atomic_read(&p->mm->membarrier_state)" or add
> >> RCU to the lifetime of mm_struct. I'm not entirely sure what the
> >> better fix is... probably task_lock() makes more sense?
> > 
> > Ouch!!!
> > 
> > Acquiring task_lock() would work, but would be a global lock.
> > This could be addressed to some extent by batching concurrent
> > membarrier_global_expedited() invocations, so that one call to
> > membarrier_global_expedited() does the job for the set of concurrent
> > calls.  The usual approach would use a counter, a pair of wait queues,
> > and a kthread.
> 
> We could start by grabbing the task_lock() as an initial fix, and
> then address any performance-related complains with your approach
> if need be.

Makes sense to me, always good to start simply.

> > I must defer to the mm guys on adding RCU to the lifetime of mm_struct.
> 
> Likewise.
> 
> > Another approach would be to put the MEMBARRIER_STATE_GLOBAL_EXPEDITED
> > in the task structure.
> 
> Then the tricky part becomes how to make sure the per-task-struct
> state is consistent across all tasks pointing to the same mm_struct
> (including processes created with clone CLONE_VM flag).

If a multi-threaded process can change its mm_struct, agreed.  I was
under the impression that such a change can only happen while the task
is single-threaded, but I wouldn't trust my impression all that much.

> > Yet another approach would be to acquire the
> > runqueue lock, thus preventing the task from switching away -- except
> > that it might be in the middle of exit(), so never mind.
> 
> And I suspect that grabbing the runqueue lock may cause more contention
> that grabbing the task_lock().

Quite possibly.

> I'll send a patch implementing the task_lock() approach as RFC.

Sounds good to me!

							Thanx, Paul

> Thanks,
> 
> Mathieu
> 
> > 
> > Other approaches?
> > 
> >							Thanx, Paul
> > 
> >> To test the bug, I patched an extra delay into the code:
> >> 
> >> ====================
> >> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> >> index 3cd8a3a795d2..69cc52039576 100644
> >> --- a/kernel/sched/membarrier.c
> >> +++ b/kernel/sched/membarrier.c
> >> @@ -14,6 +14,7 @@
> >>   * GNU General Public License for more details.
> >>   */
> >>  #include "sched.h"
> >> +#include <linux/delay.h>
> >> 
> >>  /*
> >>   * Bitmask made from a "or" of all commands within enum membarrier_cmd,
> >> @@ -81,7 +82,7 @@ static int membarrier_global_expedited(void)
> >> 
> >>                 rcu_read_lock();
> >>                 p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> >> -               if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
> >> +               if (p && p->mm && (mdelay(100), 1) &&
> >> (atomic_read(&p->mm->membarrier_state) &
> >>                                    MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
> >>                         if (!fallback)
> >>                                 __cpumask_set_cpu(cpu, tmpmask);
> >> ====================
> >> 
> >> On a kernel with that patch applied, I ran this test code:
> >> 
> >> ====================
> >> #define _GNU_SOURCE
> >> #include <unistd.h>
> >> #include <sys/syscall.h>
> >> #include <stdio.h>
> >> #include <linux/membarrier.h>
> >> #include <err.h>
> >> 
> >> int main(void) {
> >>   while (1) {
> >>     printf("executing global expedited barrier...\n");
> >>     int res = syscall(__NR_membarrier, MEMBARRIER_CMD_GLOBAL_EXPEDITED, 0);
> >>     if (res) err(1, "barrier");
> >>   }
> >> }
> >> ====================
> >> 
> >> That resulted in this splat:
> >> 
> >> [  212.697681]
> >> ==================================================================
> >> [  212.700582] BUG: KASAN: null-ptr-deref in
> >> membarrier_global_expedited+0x15f/0x220
> >> [  212.703346] Read of size 4 at addr 0000000000000378 by task barrier/1177
> >> 
> >> [  212.706384] CPU: 1 PID: 1177 Comm: barrier Not tainted 5.0.0-rc3+ #246
> >> [  212.708925] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >> BIOS 1.10.2-1 04/01/2014
> >> [  212.712263] Call Trace:
> >> [  212.713177]  dump_stack+0x71/0xab
> >> [  212.714375]  ? membarrier_global_expedited+0x15f/0x220
> >> [  212.716236]  ? membarrier_global_expedited+0x15f/0x220
> >> [  212.718099]  kasan_report+0x176/0x192
> >> [  212.719445]  ? finish_task_switch+0x340/0x3d0
> >> [  212.721057]  ? membarrier_global_expedited+0x15f/0x220
> >> [  212.722921]  membarrier_global_expedited+0x15f/0x220
> >> [  212.724696]  ? ipi_mb+0x10/0x10
> >> [  212.725816]  ? vfs_write+0x120/0x230
> >> [  212.727113]  ? __ia32_sys_read+0x50/0x50
> >> [  212.728596]  __x64_sys_membarrier+0x85/0xf0
> >> [  212.730056]  do_syscall_64+0x73/0x160
> >> [  212.731428]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [  212.733236] RIP: 0033:0x7fbe8747e229
> >> [  212.734540] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
> >> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> >> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3f 4c 2b 00 f7 d8 64 89
> >> 01 48
> >> [  212.741109] RSP: 002b:00007fffcb62a7c8 EFLAGS: 00000202 ORIG_RAX:
> >> 0000000000000144
> >> [  212.743831] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbe8747e229
> >> [  212.746335] RDX: 00007fbe87475730 RSI: 0000000000000000 RDI: 0000000000000002
> >> [  212.748855] RBP: 00007fffcb62a7e0 R08: 00007fffcb62a8c0 R09: 00007fffcb62a8c0
> >> [  212.751374] R10: 00007fbe8793c700 R11: 0000000000000202 R12: 0000563ee2ac9610
> >> [  212.753842] R13: 00007fffcb62a8c0 R14: 0000000000000000 R15: 0000000000000000
> >> [  212.756305]
> >> ==================================================================
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com
>