linux-kernel - Re: [PATCH] mm: fix lazy vmap purging (use-after-free error)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090221174703.GA6860@linux.vnet.ibm.com>
Date:	Sat, 21 Feb 2009 09:47:03 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Vegard Nossum <vegard.nossum@...il.com>
Cc:	Ingo Molnar <mingo@...e.hu>, stable@...nel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>,
	Pekka Enberg <penberg@...helsinki.fi>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: fix lazy vmap purging (use-after-free error)

On Sat, Feb 21, 2009 at 10:30:58AM +0100, Vegard Nossum wrote:
> 2009/2/21 Paul E. McKenney <paulmck@...ux.vnet.ibm.com>:
> > This is 4.1 milliseconds, so is quite plausible.  Is the code -really-
> > disabling preemption for 4.1 milliseconds?
> >
> >> [  449.677407] __free_vmap_area(c7806ac0)
> >> [  449.680113] rcu_free_va(c7806a80)
> >
> > 5.4 milliseconds...
> >
> >> [  449.682821] __free_vmap_area(c7806b00)
> >> [  449.684264] rcu_free_va(c7806ac0)
> >
> > 6.9 milliseconds...
> >
> >> [  449.686525] __free_vmap_area(c7806b40)
> >> [  449.688205] rcu_free_va(c7806b00)
> >
> > 5.4 milliseconds...
> >
> >> ...and goes on for a long time, until something triggers this:
> >>
> >> [  449.902253] rcu_free_va(c7839d00)
> >> [  449.903247] WARNING: kmemcheck: Caught 32-bit read from freed
> >> memory (c7839d20)
> >>
> >> ...and finally:
> >>
> >> [  457.580253] __purge_vmap_area_lazy() end
> >> [  457.581201] rcu_free_va(c78974c0)
> >
> > And I don't see the corresponding __free_vmap_area() for either of the
> > above rcu_free_va() calls.  Would you be willing to forward the
> > timestamp for the __free_vmap_area() for c7839d20?
> 
> I'm sorry. The numbers in paranthesis are the struct vmap_area
> pointers, not the actual parameters being passed to the function. So
> when we have this:
> 
> [  449.696775] __free_vmap_area(c7806c00)
> [  449.697274] rcu_free_va(c7806bc0)
> [  449.699543] __free_vmap_area(c7806c40)
> [  449.701104] rcu_free_va(c7806c00)
> [  449.703353] __free_vmap_area(c7806c80)
> [  449.704247] rcu_free_va(c7806c40)
> 
> rcu_free_va(c7806c00) corresponds to rcu_free_va(c7806c00).

OK, so 449.701104-449.696775=0.004329, or about four milliseconds,
correct?

> >> So this is also what I meant by "immediately": The RCU callbacks are
> >> getting called inside the loop, and they're almost always paired with
> >> the list removal, or lagging one object behind.
> >>
> >> My guess is that this code posts "too many callbacks", which would
> >> "force the grace period" according to __call_rcu() in
> >> kernel/rcutree.c. What do you think about this?
> >
> > If the code really suppresses preemption across the whole loop, then
> > any attempt to force the grace period should fail.  Is it possible that
> > preemption is momentarily enabled somewhere within the loop?  Or that
> > we are seeing multiple passes through the loop rather than one big long
> > pass through the loop?
> 
> Multiple passes? No. We have a print-out at the beginning and at the
> end, and there's nothing else happening in-between. It doesn't leave
> the function __purge_vmap_area_lazy. I don't see preemption being
> enabled anywhere in __free_vmap_area (or its calls).
> 
> I single-stepped __free_vmap_area, and it will get interrupted. I got
> (among other things) this chain of calls:
> 
> do_IRQ -> handle_irq -> handle_level_irq -> handle_IRQ_event ->
> timer_interrupt -> ... -> run_local_timers -> raise_softirq (nr=1)
> 
> And at one point it also calls into RCU machinery:
> 
> update_process_times (user_tick=0) at kernel/timer.c:1033
> 1033            if (rcu_pending(cpu))
> (gdb)
> rcu_pending (cpu=0) at kernel/rcutree.c:1288
> 1288            return __rcu_pending(&rcu_state, &per_cpu(rcu_data, cpu)) ||
> ...
> update_process_times (user_tick=0) at kernel/timer.c:1034
> 1034                    rcu_check_callbacks(cpu, user_tick);
> (gdb)
> rcu_check_callbacks (cpu=0, user=0) at kernel/rcutree.c:949
> 949     {
> ...
> rcu_check_callbacks (cpu=0, user=-1049147360) at kernel/rcutree.c:967
> 967                     rcu_qsctr_inc(cpu);

????  Are the argument values trustworthy?  If so, I don't see how
the variable user transitioned from zero to non-zero.

The value user!=0 tells RCU that we were interrupted from a user process,
but this immediately follows user==0.  If we really were interrupted
from kernel code, (including from an irq handler) we should have user==0.

The user!=0 causes RCU to conclude that we are in a quiescent state.

RCU is then within its rights to process callbacks, which would result
in the behavior you saw.

> (gdb)
> rcu_qsctr_inc () at include/linux/rcutree.h:253
> 253             struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
> (gdb)
> 255             rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> 254             rdp->passed_quiesc = 1;
> (gdb)
> 255             rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> rcu_check_callbacks (cpu=0, user=<value optimized out>) at kernel/rcutree.c:979
> 979                     rcu_bh_qsctr_inc(cpu);
> (gdb)
> rcu_bh_qsctr_inc () at include/linux/rcutree.h:259
> 259             struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu);
> (gdb)
> 261             rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> 260             rdp->passed_quiesc = 1;
> (gdb)
> 261             rdp->passed_quiesc_completed = rdp->completed;
> (gdb)
> rcu_check_callbacks (cpu=0, user=<value optimized out>) at kernel/rcutree.c:981
> 981             raise_softirq(RCU_SOFTIRQ);
> (gdb)
> raise_softirq (nr=8) at kernel/softirq.c:313
> 313     {
> 
> We also get into __rcu_check_callbacks after a while, with this stacktrace:
> 
> #0  do_IRQ (regs=0xc170ddc0) at arch/x86/kernel/irq.c:201
> #1  <signal handler called>
> #2  0xc1073b7a in __rcu_process_callbacks (rsp=0xc1677060, rdp=0xc1941320)
>     at kernel/rcutree.c:1127
> #3  0xc1073dbf in rcu_process_callbacks (unused=<value optimized out>)
>     at kernel/rcutree.c:1162
> #4  0xc103741f in __do_softirq () at kernel/softirq.c:198
> #5  0xc10374fd in do_softirq () at kernel/softirq.c:244
> #6  0xc1037655 in irq_exit () at kernel/softirq.c:281
> #7  0xc100529f in do_IRQ (regs=0xc170de98) at arch/x86/kernel/irq.c:223
> #8  <signal handler called>
> #9  __free_vmap_area (va=0xc7803a40) at mm/vmalloc.c:411
> #10 0xc1096dd9 in __purge_vmap_area_lazy (start=0xc170df10, end=0xc170df0c,
>     sync=<value optimized out>, force_flush=0) at mm/vmalloc.c:542
> #11 0xc1096f1e in try_purge_vmap_area_lazy () at mm/vmalloc.c:556
> #12 free_unmap_vmap_area_noflush (va=<value optimized out>) at mm/vmalloc.c:578
> #13 0xc1096f4a in free_unmap_vmap_area () at mm/vmalloc.c:587
> #14 remove_vm_area (addr=<value optimized out>) at mm/vmalloc.c:1168
> #15 0xc1097005 in __vunmap (addr=0xc7803a40, deallocate_pages=0)
>     at mm/vmalloc.c:1194
> #16 0xc10970be in vunmap (addr=0xc7803aa0) at mm/vmalloc.c:1253
> #17 0xc1008ba5 in text_poke (addr=0xc127ba4f, opcode=0xc170df8f, len=1)
>     at arch/x86/kernel/alternative.c:523
> #18 0xc1008ca9 in alternatives_smp_unlock (start=<value optimized out>,
>     end=0xc1713360, text=0xc1000000 "�\206\021\002", text_end=0xc150250f "")
>     at arch/x86/kernel/alternative.c:252
> #19 0xc171ef47 in alternative_instructions ()
>     at arch/x86/kernel/alternative.c:438
> #20 0xc171f991 in check_bugs () at arch/x86/kernel/cpu/bugs.c:168
> #21 0xc17148c5 in start_kernel () at init/main.c:687
> #22 0xc171409e in i386_start_kernel () at arch/x86/kernel/head32.c:43
> #23 0x00000000 in ?? ()
> 
> (On a side note, it strikes me that __do_softirq() does
> local_irq_enable(), which means that __rcu_process_callbacks() can be
> interrupted and the interrupt handler can call rcu_process_callbacks()
> again...)
> 
> Does this ring any bells or make any sense at all? What else can I do
> to help understand what's going on?

I will look into what might have caused rcu_check_callbacks()'s
"user" argument to suddenly become non-zero.

> This, by the way, happens regardless of whether kmemcheck is used or
> not. The only thing I do to trigger this particular behaviour is to
> run an SMP kernel on a UP machine, since it will call
> alternative_instructions() like we see in the stack trace above.

OK, good to know, thank you!

Is this behavior recent, or does it apply to earlier 2.6.29-rc kernels
as well?  Does it happen with CONFIG_CLASSIC_RCU as well?  (From the
trace above, I suspect that it might well do so, but if not, that will
be valuable information as well.)

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/