netdev - Re: [RFC bpf-next] bpf: Use prog->active instead of bpf_prog_active for kprobe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQJtJEzsr4a+brn4n65Pt1dP-3WTSHZe23_AetxWRTm0Tg@mail.gmail.com>
Date:   Thu, 9 Jun 2022 15:03:35 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc:     Jiri Olsa <jolsa@...nel.org>, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...omium.org>
Subject: Re: [RFC bpf-next] bpf: Use prog->active instead of bpf_prog_active
 for kprobe_multi

On Thu, Jun 9, 2022 at 11:27 AM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Tue, Jun 7, 2022 at 9:29 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Tue, May 31, 2022 at 4:24 PM Andrii Nakryiko
> > <andrii.nakryiko@...il.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 4:40 AM Jiri Olsa <jolsa@...nel.org> wrote:
> > > >
> > > > hi,
> > > > Alexei suggested to use prog->active instead global bpf_prog_active
> > > > for programs attached with kprobe multi [1].
> > > >
> > > > AFAICS this will bypass bpf_disable_instrumentation, which seems to be
> > > > ok for some places like hash map update, but I'm not sure about other
> > > > places, hence this is RFC post.
> > > >
> > > > I'm not sure how are kprobes different to trampolines in this regard,
> > > > because trampolines use prog->active and it's not a problem.
> > > >
> > > > thoughts?
> > > >
> > >
> > > Let's say we have two kernel functions A and B? B can be called from
> > > BPF program though some BPF helper, ok? Now let's say I have two BPF
> > > programs kprobeX and kretprobeX, both are attached to A and B. With
> > > using prog->active instead of per-cpu bpf_prog_active, what would be
> > > the behavior when A is called somewhere in the kernel.
> > >
> > > 1. A is called
> > > 2. kprobeX is activated for A, calls some helper which eventually calls B
> > >   3. kprobeX is attempted to be called for B, but is skipped due to prog->active
> > >   4. B runs
> > >   5. kretprobeX is activated for B, calls some helper which eventually calls B
> > >     6. kprobeX is ignored (prog->active > 0)
> > >     7. B runs
> > >     8. kretprobeX is ignored (prog->active > 0)
> > > 9. kretprobeX is activated for A, calls helper which calls B
> > >   10. kprobeX is activated for B
> > >     11. kprobeX is ignored (prog->active > 0)
> >
> > not correct. kprobeX actually runs.
> > but the end result is correct.
> >
>
> Right, it was a long sequence, but you got the idea :)
>
> > >     12. B runs
> > >     13. kretprobeX is ignored (prog->active > 0)
> > >   14. B runs
> > >   15. kretprobeX is ignored (prog->active > 0)
> > >
> > >
> > > If that's correct, we get:
> > >
> > > 1. kprobeX for A
> > > 2. kretprobeX for B
> > > 3. kretprobeX for A
> > > 4. kprobeX for B
> >
> > Here it's correct.
> >
> > > It's quite mind-boggling and annoying in practice. I'd very much
> > > prefer just kprobeX for A followed by kretprobeX for A. That's it.
> > >
> > > I'm trying to protect against this in retsnoop with custom per-cpu
> > > logic in each program, but I so much more prefer bpf_prog_active,
> > > which basically says "no nested kprobe calls while kprobe program is
> > > running", which makes a lot of sense in practice.
> >
> > It makes sense for retsnoop, but does not make sense in general.
> >
> > > Given kprobe already used global bpf_prog_active I'd say multi-kprobe
> > > should stick to bpf_prog_active as well.
> >
> > I strongly disagree.
> > Both multi kprobe and kprobe should move to per prog counter
> > plus some other protection
> > (we cannot just move to per-prog due to syscalls).
> > It's true that the above order is mind-boggling,
> > but it's much better than
> > missing kprobe invocation completely just because
> > another kprobe is running on the same cpu.
> > People complained numerous times about this kprobe behavior.
> > kprobeX attached to A
> > kprobeY attached to B.
> > If kprobeX calls B kprobeY is not going to be called.
> > Means that anything that bpf is using is lost.
> > spin locks, lists, rcu, etc.
> > Sleepable uprobes are coming.
> > iirc Delyan's patch correctly.
> > We will do migrate_disable and inc bpf_prog_active.
>
> This might be a different issue, I'm not sure why uprobes should be
> protected by the same global bpf_prog_active, you can't trigger uprobe
> from uprobe program. And especially for sleepable programs it makes no
> sense to use per-CPU protection (we have bpf_run_ctx for such
> protections, if needed).

you're forgetting about tracepoints and perf_events.
'perf record' will capture everything.
Whereas bpf powered 'perf record' will not see bpf progs
attached to [ku]probe, tp, events.

> > Now random kprobes on that cpu will be lost.
>
> It's not random. The rule is you can't kernel functions and
> tracepoints triggered from BPF kprobes/tracepoints. This prevents
> nasty reentrance problems and makes sense.

>From the user point of view it makes no sense whatsoever.
bperf is losing info.
Try explaining that to users.

> Isn't kernel tracing infra
> is protecting itself similarly, preventing reentrancy and recursion?

Yes and that's what it should be doing.
It's not the job of the kernel to implement the policy.
"run one bpf prog per cpu at a time" is a policy
and very much a broken one.

> > It's awful. We have to fix it.
>
> You can call it "a fix" if you'd like, but it's changing a very
> user-visible behavior and guarantees on which users relied for a
> while. So even if we switch to per-prog protection it will have to be
> some sort of opt-in (flag, new program type, whatever).

No opt-in allowed for fixes and it's a bug fix.
No one should rely on broken kernel behavior.
If retsnoop rely on that it's really retsnoop's fault.