netdev - Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+55NKkEaAsjGh52=VsSgr9G-qvjBCPmaPrTxiN6eCZOw@mail.gmail.com>
Date: Thu, 8 Aug 2024 18:23:02 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Christian Brauner <brauner@...nel.org>, viro@...nel.org, bpf <bpf@...r.kernel.org>, 
	Linux-Fsdevel <linux-fsdevel@...r.kernel.org>, Amir Goldstein <amir73il@...il.com>, 
	"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>, kvm@...r.kernel.org, 
	Network Development <netdev@...r.kernel.org>, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a
 single ldimm64 insn into helper

On Thu, Aug 8, 2024 at 1:35 PM Andrii Nakryiko
<andrii.nakryiko@...il.com> wrote:
>
> On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> > <andrii.nakryiko@...il.com> wrote:
> > >
> > > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@...nel.org> wrote:
> > > >
> > > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@...nel.org> wrote:
> > > > > >
> > > > > > From: Al Viro <viro@...iv.linux.org.uk>
> > > > > >
> > > > > > Equivalent transformation.  For one thing, it's easier to follow that way.
> > > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > > thing much easier to verify wrt leaks.
> > > > > >
> > > > > > Signed-off-by: Al Viro <viro@...iv.linux.org.uk>
> > > > > > ---
> > > > > >  kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > > >  1 file changed, 172 insertions(+), 170 deletions(-)
> > > > > >
> > > > >
> > > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > > send to bpf-next.
> > > > >
> > > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > > Author: Andrii Nakryiko <andrii@...nel.org>
> > > > > Date:   Tue Aug 6 14:31:34 2024 -0700
> > > > >
> > > > >     bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > > >
> > > > >     Factor out the logic to extract bpf_map instances from FD embedded in
> > > > >     bpf_insns, adding it to the list of used_maps (unless it's already
> > > > >     there, in which case we just reuse map's index). This simplifies the
> > > > >     logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > > >     handling, as all that is now neatly contained in the helper and doesn't
> > > > >     leak into a dozen error handling paths.
> > > > >
> > > > >     Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
> > > > >
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index df3be12096cf..14e4ef687a59 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > > bpf_map *map)
> > > > >          map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > > >  }
> > > > >
> > > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > > + * used maps.
> > > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > > + */
> > > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > > bool *reused)
> > > > > +{
> > > > > +    struct fd f = fdget(fd);
> > > >
> > > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> > >
> > > That was the point of Al's next patch in the series, so I didn't want
> > > to do it in this one that just refactored the logic of adding maps.
> > > But I can fold that in and send it to bpf-next.
> >
> > +1.
> >
> > The bpf changes look ok and Andrii's approach is easier to grasp.
> > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > so it goes through bpf CI and our other testing.
> >
> > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > and fderr, so pretty much independent from other patches.
>
> Ok, so CLASS(fd, f) won't work just yet because of peculiar
> __bpf_map_get() contract: if it gets valid struct fd but it doesn't
> contain a valid struct bpf_map, then __bpf_map_get() does fdput()
> internally. In all other cases the caller has to do fdput() and
> returned struct bpf_map's refcount has to be bumped by the caller
> (__bpf_map_get() doesn't do that, I guess that's why it's
> double-underscored).
>
> I think the reason it was done was just a convenience to not have to
> get/put bpf_map for temporary uses (and instead rely on file's
> reference keeping bpf_map alive), plus we have bpf_map_inc() and
> bpf_map_inc_uref() variants, so in some cases we need to bump just
> refcount, and in some both user and normal refcounts.
>
> So can't use CLASS(fd, ...) without some more clean up.
>
> Alexei, how about changing __bpf_map_get(struct fd f) to
> __bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
> always returning bpf_map with (normal) refcount bumped (if successful,
> of course). We can then split bpf_map_inc_with_uref() into just
> bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
> extra uref-only increment, if necessary.
>
> I can do that as a pre-patch, there are about 15 callers, so not too
> much work to clean this up. Let me know.

Yeah. Let's kill __bpf_map_get(struct fd ..) altogether.
This logic was added in 2014.
fdget() had to be first and fdput() last to make sure
the map won't disappear while sys_bpf command is running.
All of the places can use bpf_map_get(), bpf_map_put() pair
and rely on map->refcnt, but...

- it's atomic64_inc(&map->refcnt); The cost is probably
in the noise compared to all the work that map sys_bpf commands do.

- It also opens new fuzzing opportunity to do some map operation
in one thread and close(map_fd) in the other, so map->usercnt can
drop to zero and map_release_uref() cleanup can start while
the other thread is still busy doing something like map_update_elem().
It can be mitigated by doing bpf_map_get_with_uref(), but two
atomic64_inc() is kinda too much.

So let's remove __bpf_map_get() and replace all users with bpf_map_get(),
but we may need to revisit that later.