[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BDED3B27-B42F-44AD-904E-010752462A67@fb.com>
Date: Thu, 14 Jul 2022 04:37:43 +0000
From: Song Liu <songliubraving@...com>
To: Steven Rostedt <rostedt@...dmis.org>
CC: Song Liu <song@...nel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ast@...nel.org" <ast@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"andrii@...nel.org" <andrii@...nel.org>,
Kernel Team <Kernel-team@...com>,
"jolsa@...nel.org" <jolsa@...nel.org>,
"mhiramat@...nel.org" <mhiramat@...nel.org>
Subject: Re: [PATCH v2 bpf-next 1/5] ftrace: allow customized flags for
ftrace_direct_multi ftrace_ops
> On Jul 13, 2022, at 7:55 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> On Thu, 14 Jul 2022 01:42:59 +0000
> Song Liu <songliubraving@...com> wrote:
>
>>> As I replied to patch 3, here's my thoughts.
>>>
>>> DIRECT is treated as though it changes the IP. If you register it to a
>>> function that has an IPMODIFY already set to it, it will call the
>>> ops->ops_func() asking if the ops can use SHARED_IPMODIFY (which means
>>> a DIRECT can be shared with IPMODIFY). If it can, then it returns true,
>>> and the SHARED_IPMODIFY is set *by ftrace*. The user of the ftrace APIs
>>> should not have to manage this. It should be managed by the ftrace
>>> infrastructure.
>>
>> Hmm... I don't think this gonna work.
>>
>> First, two IPMODIFY ftrace ops cannot work together on the same kernel
>> function. So there won't be a ops with both IPMODIFY and SHARE_IPMODIFY.
>
> I'm not saying that.
>
> I'm saying that ftrace does not have any clue (nor cares) about what a
> DIRECT ops does. It might modify the IP or it might not. It doesn't know.
>
> But ftrace has control over the callbacks it does control.
>
> A DIRECT ops knows if it can work with another ops that has IPMODIFY set.
> If the DIRECT ops does not work with IPMODIFY (perhaps it wants to modify
> the IP), then it will tell ftrace that it can't and ftrace will not allow
> it.
>
> That is, ftrace doesn't care if the DIRECT ops modifies the IP or not. It
> can only ask (through the ops->ops_func()) if the direct trampoline can
> honor the IP that is modified. If it can, then it reports back that it can,
> and then ftrace will set that ops to SHARED_MODIFY, and the direct function
> can switch the ops->func() to one that uses SHARED_MODIFY.
>
>>
>> non-direct ops without IPMODIFY can already share with IPMODIFY ops.
>
> It can? ftrace sets IPMODIFY for all DIRECT callers to prevent that. Except
> for this patch that removes that restriction (which I believe is broken).
I mean "non-direct" ftrace ops, not direct ftrace ops.
>
>> Only direct ops need SHARE_IPMODIFY flag, and it means "I can share with
>> another ops with IPMODIFY". So there will be different flavors of
>> direct ops:
>
> I agree that only DIRECT ops can have SHARED_IPMODIFY set. That's what I'm
> saying. But I'm saying it gets set by ftrace.
>
>>
>> 1. w/ IPMODIFY, w/o SHARE_IPMODIFY;
>> 2. w/o IPMODIFY, w/o SHARE_IPMODIFY;
>> 3. w/o IPMODIFY, w/ SHARE_IPMODIFY.
>>
>> #1 can never work on the same function with another IPMODIFY ops, and
>> we don't plan to make it work. #2 does not work with another IPMODIFY
>> ops. And #3 works with another IPMODIFY ops.
>
> Lets look at this differently. What I'm proposing is that registering a
> direct ops does not need to tell ftrace if it modifies the IP or not.
> That's because ftrace doesn't care. Once ftrace calls the direct trampoline
> it loses all control. With the ftrace ops callbacks, it is the one
> responsible for setting up the modified IP. That's not the case with the
> direct trampolines.
>
> I'm saying that ftrace doesn't care what the DIRECT ops does. But it will
> not, by default, allow an IPMODIFY to happen when a DIRECT ops is on the
> same function, or vice versa.
>
> What I'm suggesting is when a IPMODIFY tries to attach to a function that
> also has a DIRECT ops, or a DIRECT tries to attach to a function that
> already has an IPMODIFY ops on it, that ftrace calls the direct
> ops->ops_func() asking if it is safe to use with an IPMODIFY function.
>
> If the direct ops modifies the IP itself, it will return a "no", and ftrace
> will reject the attachment. If the direct ops can, it returns a "yes" and
> then ftrace will set the SHARED_IPMODIFY flag to that ops and continue.
>
> The fact that the ops->ops_func was called will let the caller (bpf) know
> that it needs to use the stack to return to the function, or to call it if
> it is also tracing the return.
>
> If the IPMODIFY ops is removed, then ftrace will call the ops->ops_func()
> telling it it no longer has the IPMODIFY set, and will clear the
> SHARED_IPMODIFY flag, and then the owner of the direct ops can go back to
> doing whatever it did before (the calling the function directly, or
> whatever).
>
>>
>> The owner of the direct trampoline uses these flags to tell ftrace
>> infrastructure the property of this trampoline.
>
> I disagree. The owner shouldn't have to care about the flags. Let ftrace
> handle it. This will make things so much more simple for both BPF and
> ftrace.
>
>>
>> BPF trampolines with only fentry calls are #3 direct ops. BPF
>> trampolines with fexit or fmod_ret calls will be #2 trampoline by
>> default, but it is also possible to generate #3 trampoline for it.
>
> And ftrace doesn't care about this. But bpf needs to care if the IP is
> being modified or not. Which can be done by the ops->ops_func() that you
> added.
>
>>
>> BPF side will try to register #2 trampoline, If ftrace detects another
>> IPMODIFY ops on the same function, it will let BPF trampoline know
>> with -EAGAIN from register_ftrace_direct_multi(). Then, BPF side will
>> regenerate a #3 trampoline and register it again.
>
> This is too complex. You are missing the simple way.
>
>>
>> I know this somehow changes the policy with direct ops, but it is the
>> only way this can work, AFAICT.
>
> I disagree. There's a much better way that this can work.
>
>>
>> Does this make sense? Did I miss something?
>
>
> Let me start from the beginning.
I got your point now. We replace the flag on direct trampoline with a
callback check. So yes, this works.
>
> 1. Live kernel patching patches function foo.
>
> 2. lkp has an ops->flags | IPMODIFY set when it registers.
>
> 3. bpf registers a direct trampoline to function foo.
>
> 4. bpf has an ops->flags | DIRECT set when it registers
>
> 5. ftrace sees that the function already has an ops->flag = IPMODIFY on it,
> so it calls bpf ops->ops_func(SHARE_WITH_IPMODIFY)
>
> 6. bpf can and does the following
>
> a. if it's the simple #1 trampoline (just traces the start of a function)
> it doesn't need to do anything special returns "yes".
>
> b. if it's the #2 trampoline, it will change the trampoline to use the
> stack to find what to call, and returns "yes".
>
> 7. ftrace gets "yes" and sets the *ipmodify* ops with SHARED_IPMODIFY
> (there's a reason for setting this flag for the ipmodify ops and not the
> direct ops).
>
>
> 8. LKP is removed from the function foo.
>
> 9. ftrace sees the lkp IPMODIFY ops has SHARED_IPMODIFY on it, and knows
> that there's a direct call here too. It removes the IPMODIFY ops, and
> then calls the direct ops->ops_func(STOP_SHARE_WITH_IPMODIFY) to let the
> direct code know that it is no longer sharing with an IPMODIFY such that
> it can change to call the function directly and not use the stack.
I wonder whether we still need this flag. Alternatively, we can always
find direct calls on the function and calls ops_func(STOP_SHARE_WITH_IPMODIFY).
What do you think about this?
Thanks,
Song
Powered by blists - more mailing lists