[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0EB34157-8BCA-47FC-B78F-AA8FE45A1707@fb.com>
Date: Fri, 15 Jul 2022 19:49:00 +0000
From: Song Liu <songliubraving@...com>
To: Steven Rostedt <rostedt@...dmis.org>
CC: Song Liu <song@...nel.org>, Networking <netdev@...r.kernel.org>,
bpf <bpf@...r.kernel.org>, lkml <linux-kernel@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Kernel Team <Kernel-team@...com>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"mhiramat@...nel.org" <mhiramat@...nel.org>
Subject: Re: [PATCH v2 bpf-next 3/5] ftrace: introduce
FTRACE_OPS_FL_SHARE_IPMODIFY
> On Jul 15, 2022, at 12:12 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Fri, 15 Jul 2022 17:42:55 +0000
> Song Liu <songliubraving@...com> wrote:
>
>
>> A quick update and ask for feedback/clarification.
>>
>> Based on my understanding, you recommended calling ops_func() from
>> __ftrace_hash_update_ipmodify() and in ops_func() the direct trampoline
>> may make changes to the trampoline. Did I get this right?
>>
>>
>> I am going towards this direction, but hit some issue. Specifically, in
>> __ftrace_hash_update_ipmodify(), ftrace_lock is already locked, so the
>> direct trampoline cannot easily make changes with
>> modify_ftrace_direct_multi(), which locks both direct_mutex and
>> ftrace_mutex.
>>
>> One solution would be have no-lock version of all the functions called
>> by modify_ftrace_direct_multi(), but that's a lot of functions and the
>> code will be pretty ugly. The alternative would be the logic in v2:
>> __ftrace_hash_update_ipmodify() returns -EAGAIN, and we make changes to
>> the direct trampoline in other places:
>>
>> 1) if DIRECT ops attached first, the trampoline is updated in
>> prepare_direct_functions_for_ipmodify(), see 3/5 of v2;
>>
>> 2) if IPMODIFY ops attached first, the trampoline is updated in
>> bpf_trampoline_update(), see "goto again" path in 5/5 of v2.
>>
>> Overall, I think this way is still cleaner. What do you think about this?
>
> What about if we release the lock when doing the callback?
We can probably unlock ftrace_lock here. But we may break locking order
with direct mutex (see below).
>
> Then we just need to make sure things are the same after reacquiring the
> lock, and if they are different, we release the lock again and do the
> callback with the new update. Wash, rinse, repeat, until the state is the
> same before and after the callback with locks acquired?
Personally, I would like to avoid wash-rinse-repeat here.
>
> This is a common way to handle callbacks that need to do something that
> takes the lock held before doing a callback.
>
> The reason I say this, is because the more we can keep the accounting
> inside of ftrace the better.
>
> Wouldn't this need to be done anyway if BPF was first and live kernel
> patching needed the update? An -EAGAIN would not suffice.
prepare_direct_functions_for_ipmodify handles BPF-first-livepatch-later
case. The benefit of prepare_direct_functions_for_ipmodify() is that it
holds direct_mutex before ftrace_lock, and keeps holding it if necessary.
This is enough to make sure we don't need the wash-rinse-repeat.
OTOH, if we wait until __ftrace_hash_update_ipmodify(), we already hold
ftrace_lock, but not direct_mutex. To make changes to bpf trampoline, we
have to unlock ftrace_lock and lock direct_mutex to avoid deadlock.
However, this means we will need the wash-rinse-repeat.
For livepatch-first-BPF-later case, we can probably handle this in
__ftrace_hash_update_ipmodify(), since we hold both direct_mutex and
ftrace_lock. We can unlock ftrace_lock and update the BPF trampoline.
It is safe against changes to direct ops, because we are still holding
direct_mutex. But, is this safe against another IPMODIFY ops? I am not
sure yet... Also, this is pretty weird because, we are updating a
direct trampoline before we finish registering it for the first time.
IOW, we are calling modify_ftrace_direct_multi_nolock for the same
trampoline before register_ftrace_direct_multi() returns.
The approach in v2 propagates the -EAGAIN to BPF side, so these are two
independent calls of register_ftrace_direct_multi(). This does require
some protocol between ftrace core and its user, but I still think this
is a cleaner approach.
Does this make sense?
Thanks,
Song
Powered by blists - more mailing lists