linux-kernel - Re: [PATCH] tracing: Track event ref in tracefs enable/disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20221013163928.GA80@W11-BEAU-MD.localdomain>
Date:   Thu, 13 Oct 2022 09:39:28 -0700
From:   Beau Belgrave <beaub@...ux.microsoft.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     mhiramat@...nel.org, linux-trace-devel@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] tracing: Track event ref in tracefs enable/disable

On Thu, Oct 13, 2022 at 03:58:27AM -0400, Steven Rostedt wrote:
> On Wed, 12 Oct 2022 17:19:38 -0700
> Beau Belgrave <beaub@...ux.microsoft.com> wrote:
> 
> > > 
> > > The events are only called from the module code, and when the module is
> > > unloaded, they are no longer called. Why keep the module from unloading
> > > when enabled?  
> > 
> > Won't the modules remove the event calls? At the very least the event
> > call structure in memory goes away during module unload. If it gets
> > reused odd things will happen, right? IE: trace_module_remove_events().
> 
> Yes it gets removed along with the module. But everything about the
> static event is part of the module.
> 

I see, trace_module_remove_events() does remove the trace call for
module cases only.

> > 
> > Maybe I have a bad assumption:
> > I thought the point of trace_event_try_get_ref()/put_ref() was to tell
> > the system the call cannot go away. However, if ftrace enable doesn't
> > use these the lifetime is ambigious in this case. If this was
> > intentional, how are event call lifetimes described if not within the
> > ref?
> > 
> 
> The purpose of trace_event_try_get_ref and friends is for the case of
> dynamic events (eprobes and synthetic events) that can attach to any
> event (including module events), and the module removal does not remove
> the dynamic portion that was attached to them. And in this case, the
> removal would have dire results.
> 

Got it, uprobes and user_events are also dynamic events. Here are things
I found and am thinking about and questioning:

Uprobe specifically calls trace_event_dyn_busy() during
unregister_trace_uprobe(). This will return false in this case, since
there is no reference. Fortunately trace_uprobe_is_busy() used by
dyn_events checks the event flags to see if something really hooked
there.

User_events also tracks usage via the event.class.reg() REG/UNREG
callback. It's dyn_events is_busy() op also checks internal flags.

So normally these are not issues for either above system.

However, once we have a tracing_namespace with put/get ref semantics for
lifetime, those corner case checks won't keep us from removing the
calls.

It appears if perf attaches to any event it will take a reference. This
would imply the module won't be allowed to go away while perf is
running. I'm not sure why we wouldn't want consistency, perhaps it might
break some scenarios I'm unaware of around module unload.

> > In my namespace patches I hit this case when user_events try to go away
> > during namespace teardown. Since there is no reference to the event
> > being used I removed the call. However, it was clearly being used within
> > tracefs at that point. When I cat "enable" in this case instead of "0"
> > or "1" I get "?". I suppose worse things could happen when the memory of
> > the call gets reused?
> >
> 
> Could you have a callback telling tracefs that it is going away (like
> the module notifier does)?
> 

I could, would it be limited to just tracefs? I don't believe perf or
eBPF currently hooks into those notifiers?

If we had a consistent get/put ref model I don't think we would need
the notifiers (unless we cannot trust all legacy code doing a get/put
ref). It appears in my testing that perf is always doing get/put refs
in these cases to ensure the call cannot go away underneath it.

Thanks,
-Beau