linux-kernel - Re: call_rcu from trace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <557F7764.5060707@plumgrid.com>
Date:	Mon, 15 Jun 2015 18:09:56 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	paulmck@...ux.vnet.ibm.com
CC:	Daniel Wagner <daniel.wagner@...-carit.de>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: call_rcu from trace_preempt

On 6/15/15 4:07 PM, Paul E. McKenney wrote:
>
> Oh...  One important thing is that both call_rcu() and kfree_rcu()
> use per-CPU variables, managing a per-CPU linked list.  This is why
> they disable interrupts.  If you do another call_rcu() in the middle
> of the first one in just the wrong place, you will have two entities
> concurrently manipulating the same linked list, which will not go well.

yes. I'm trying to find that 'wrong place'.
The trace.patch is doing kmalloc/kfree_rcu for every preempt_enable.
So any spin_unlock called by first call_rcu will be triggering
2nd recursive to call_rcu.
But as far as I could understand rcu code that looks ok everywhere.
call_rcu
   debug_rcu_head_[un]queue
     debug_object_activate
       spin_unlock

and debug_rcu_head* seems to be called from safe places
where local_irq is enabled.

> Maybe mark call_rcu() and the things it calls as notrace?  Or you
> could maintain a separate per-CPU linked list that gathered up the
> stuff to be kfree()ed after a grace period, and some time later
> feed them to kfree_rcu()?

yeah, I can think of this or 10 other ways to fix it within
kprobe+bpf area, but I think something like call_rcu_notrace()
may be a better solution.
Or may be single generic 'fix' for call_rcu will be enough if
it doesn't affect all other users.

> The usual consequence of racing a pair of callback insertions on the
> same CPU would be that one of them gets leaked, and possible all
> subsequent callbacks.  So the lockup is no surprise.  And there are a
> lot of other assumptions in nearby code paths about only one execution
> at a time from a given CPU.

yes, I don't think calling 2nd call_rcu from preempt_enable violates
this assumptions. local_irq does it job. No extra stuff is called when
interrupts are disabled.

>> Any advise on where to look is greatly appreciated.
>
> What I don't understand is exactly what you are trying to do.  Have more
> complex tracers that dynamically allocate memory?  If so, having a per-CPU
> list that stages memory to be freed so that it can be passed to call_rcu()
> in a safe environment might make sense.  Of course, that list would need
> to be managed carefully!

yes. We tried to compute the time the kernel spends between
preempt_disable->preempt_enable and plot a histogram of latencies.

> Or am I missing the point of the code below?

this trace.patch is reproducer of call_rcu crashes that doing:
preempt_enable
   trace_preempt_on
     kfree_call_rcu

The real call stack is:
preempt_enable
   trace_preempt_on
     kprobe_int3_handler
       trace_call_bpf
         bpf_map_update_elem
           htab_map_update_elem
             kree_call_rcu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/