[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1375366669.1152.23.camel@gandalf.local.home>
Date: Thu, 01 Aug 2013 10:17:49 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: linux-kernel@...r.kernel.org,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
"zhangwei(Jovi)" <jovi.zhangwei@...wei.com>,
Jiri Olsa <jolsa@...hat.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe
event files are open
On Thu, 2013-08-01 at 15:34 +0200, Oleg Nesterov wrote:
> On 07/31, Steven Rostedt wrote:
> >
> > Subject: [PATCH] tracing/kprobes: Fail to unregister if probe event files are
> > in use
> >
> > When a probe is being removed, it cleans up the event files that correspond
> > to the probe. But there is a race between writing to one of these files
> > and deleting the probe. This is especially true for the "enable" file.
> >
> > CPU 0 CPU 1
> > ----- -----
> >
> > fd = open("enable",O_WRONLY);
> >
> > probes_open()
> > release_all_trace_probes()
> > unregister_trace_probe()
> > if (trace_probe_is_enabled(tp))
> > return -EBUSY
> >
> > write(fd, "1", 1)
> > __ftrace_set_clr_event()
> > call->class->reg()
> > (kprobe_register)
> > enable_trace_probe(tp)
> >
> > __unregister_trace_probe(tp);
> > list_del(&tp->list)
> > unregister_probe_event(tp) <-- fails!
> > free_trace_probe(tp)
>
> Yes. But again, this doesn't explain why unregister_probe_event()->
> __trace_remove_event_call() can't simply proceed and
> do ftrace_event_enable_disable() + remove_event_from_tracers().
The problem is with the soft disable. We would have to look at what
functions have been attached to soft enable this event and remove them.
This can become quite complex. When the function is hit, it will try to
access the event call file. There's no inode associated to that, so the
i_private wont work. The answer would be to actually remove the
functions that are referencing it. See event_enable_probe() and
event_enable_count_probe(). The problem is that these are called from
the function tracer which means it can not take *any* locks.
This will become exponentially more complex when Tom Zanussi's patches
make it in that have events enabling other events and also using the
soft enable too.
This means that if we fail to disable, we must not destroy the event.
We may be able to start adding ref counts, such that it doesn't get
freed till it goes to zero, but every user will see it "not existing".
But that will take a bit of work to do and not for 3.11.
>
> IOW, if we do not apply the previous "trace_remove_event_call() should
> fail if call/file is in use" patch, then everything is fine:
>
>
> > write(fd, "0", 1)
>
> this will fail with ENODEV.
Currently it does not, because the failure in probe_remove_event_call()
due to the event being busy wont remove the event (event_remove() is
never called). Thus the event is still alive and the write will still
have access to it.
>
> Let's consider another race:
>
> CPU 0 CPU 1
> ----- -----
>
> probes_open()
> trace_probe_is_enabled() == F;
>
> sys_perf_event_open(attr.config == id)
>
>
> ...
> trace_remove_event_call()
>
> Whatever we do, we should abort. We could even do TRACE_REG_PERF_UNREGISTER
> (although the current code doesn't do this), but we have no way to cleanup
> the perf event's which have ->>tp_event = call.
>
> trace_remove_event_call() was already changed to return the error.
Right, and that's what this patch is about. The bug is that
trace_remove_event_call() returns an error which makes
unregister_probe_event() fail silently, which means that there's still a
reference to the tp and the write (which does happen) references it.
>
> And. Since it can fail, this obviously means that it should be checked,
> we can't blindly do free_trace_probe().
Right! That's what this patch does :-)
What you showed is the same race with perf as I have tested with ftrace.
>
> IOW, the changelog could be very simple, I think. Either
> trace_remove_event_call() should always succeed or we should check the
> possible failure.
I can update the change log to remove some of the functions that are
being called to be less confusing.
>
> But I won't argue with this changelog. The only important thing is that
> we all seem to agree that we do have the races here which can be fixed
> by this and the previous change.
>
> And just in case. I believe that the patch is fine.
OK, I'm currently running this through my tests. I'm hoping to get an
Acked-by from Srikar for uprobes. Time is running out for 3.11.
>
> Just one off-topic note,
>
> > @@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
> > /* TODO: Use batch unregistration */
> > while (!list_empty(&probe_list)) {
> > tp = list_entry(probe_list.next, struct trace_probe, list);
> > - unregister_trace_probe(tp);
> > + ret = unregister_trace_probe(tp);
> > + if (ret)
> > + goto end;
> > free_trace_probe(tp);
> > }
>
> This obviously breaks all-or-nothing semantics (I mean, this breaks
> the intent, the current code is buggy).
>
I agree, this isn't really nice, but for now we have to deal with it.
> I think we can't avoid this, and I hope this is fine. But then perhaps
> we should simply remove the "list_for_each_entry" check above?
I think this can be pushed for 3.12 to fix.
Thanks,
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists