[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230517172243.GA152@W11-BEAU-MD.localdomain>
Date: Wed, 17 May 2023 10:22:43 -0700
From: Beau Belgrave <beaub@...ux.microsoft.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-trace-kernel@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, bpf <bpf@...r.kernel.org>,
David Vernet <void@...ifault.com>, dthaler@...rosoft.com,
brauner@...nel.org, hch@...radead.org
Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached
On Tue, May 16, 2023 at 08:03:09PM -0700, Linus Torvalds wrote:
> On Tue, May 16, 2023 at 7:29 PM Steven Rostedt <rostedt@...dmis.org> wrote:
> >
> > So this code path is very much in user context (called directly by a
> > write system call). The issue that Alexei had was that it's also in an
> > rcu_read_lock() section.
> >
> > I wonder if this all goes away if we switch to SRCU?
>
> Yes, SRCU context would work.
>
> That said, how critical is this code? Because honestly, the *sanest*
> thing to do is to just hold the lock that actually protects the list,
> not try to walk the list in any RCU mode.
>
> And as far as I can tell, that's the 'event_mutex', which is already held.
>
> RCU walking of a list is only meaningful when the walk doesn't need
> the lock that guarantees the list integrity.
>
> But *modification* of a RCU-protected list still requires locking, and
> from a very cursory look, it really looks like 'event_mutex' is
> already the lock that protects the list.
>
> So the whole use of RCU during the list walking there in
> user_event_enabler_update() _seems_ pointless. You hold event_mutex -
> user_event_enabler_write() that is called in the loop already has a
> lockdep assert to that effect.
>
> So what is it that could even race and change the list that is the
> cause of that rcu-ness?
>
Processes that fork() with previous user_events need to be duplicated.
The fork() paths do not acquire the event_mutex. In the middle of a fork
an event could become enabled/disabled, which would call this part of
the code, at that time the list is actively being appended to when we
try to update the bits.
> Other code in that file happily just does
>
> mutex_lock(&event_mutex);
>
> list_for_each_entry_safe(enabler, next, &mm->enablers, link)
>
> with no RCU anywhere. Why does user_event_enabler_update() not do that?
>
This is due to the fork() case above without taking the event_mutex. I
really tried to not cause fork() to stall if a process uses user_events.
This required using RCU, maybe there is a simpler approach:
One approach I can think of is that during fork() we don't add the newly
created mm to the global list until we copy all the enablers. The COW
pages should reflect the bits if a timing window occurs there, since I
believe it's impossible for the newly forked() mm to cause a COW during
that time. Then I can drop this RCU on the enablers.
> Oh, and even those other loops are a bit strange. Why do they use the
> "_safe" variant, even when they just traverse the list without
> changing it? Look at user_event_enabler_exists(), for example.
>
The other places in the code that do this either will remove the event
depending on the situation during the for_each, or they only hold the
register lock and don't hold the event_mutex. So the disabler could get
removed out from under it.
IE: user_events_ioctl_reg() -> current_user_event_enabler_exists()
This is a place where we could just simply change to grab the
event_mutex, it's pretty isolated and we take the lock anyway further
down the path.
> I must really be missing something. That code is confusing. Or I am
> very confused.
>
> Linus
Thanks,
-Beau
Powered by blists - more mailing lists