[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKH8qBtaeddXMj6NENgvsDOzKcrNWH9RD-jhcqDBxSF8YB0oUQ@mail.gmail.com>
Date: Wed, 27 Jan 2021 10:28:16 -0800
From: Stanislav Fomichev <sdf@...gle.com>
To: Andrey Ignatov <rdna@...com>
Cc: Netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <kafai@...com>
Subject: Re: [PATCH bpf-next v4 1/2] bpf: allow rewriting to ports under ip_unprivileged_port_start
On Wed, Jan 27, 2021 at 10:24 AM Andrey Ignatov <rdna@...com> wrote:
>
> Stanislav Fomichev <sdf@...gle.com> [Tue, 2021-01-26 11:36 -0800]:
> > At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port
> > to the privileged ones (< ip_unprivileged_port_start), but it will
> > be rejected later on in the __inet_bind or __inet6_bind.
> >
> > Let's add another return value to indicate that CAP_NET_BIND_SERVICE
> > check should be ignored. Use the same idea as we currently use
> > in cgroup/egress where bit #1 indicates CN. Instead, for
> > cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should
> > be bypassed.
> >
> > v4:
> > - Add missing IPv6 support (Martin KaFai Lau)
> >
> > v3:
> > - Update description (Martin KaFai Lau)
> > - Fix capability restore in selftest (Martin KaFai Lau)
> >
> > v2:
> > - Switch to explicit return code (Martin KaFai Lau)
> >
> > Cc: Andrey Ignatov <rdna@...com>
> > Cc: Martin KaFai Lau <kafai@...com>
> > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
>
> Explicit return code looks much cleaner than both what v1 did and what I
> proposed earlier (compare port before/after).
>
> Just one nit from me but otherwide looks good.
>
> Acked-by: Andrey Ignatov <rdna@...com>
>
> ...
> > @@ -231,30 +232,48 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
> >
> > #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, type) \
> > ({ \
> > + u32 __unused_flags; \
> > int __ret = 0; \
> > if (cgroup_bpf_enabled(type)) \
> > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \
> > - NULL); \
> > + NULL, \
> > + &__unused_flags); \
> > __ret; \
> > })
> >
> > #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, type, t_ctx) \
> > ({ \
> > + u32 __unused_flags; \
> > int __ret = 0; \
> > if (cgroup_bpf_enabled(type)) { \
> > lock_sock(sk); \
> > __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \
> > - t_ctx); \
> > + t_ctx, \
> > + &__unused_flags); \
> > release_sock(sk); \
> > } \
> > __ret; \
> > })
> >
> > -#define BPF_CGROUP_RUN_PROG_INET4_BIND_LOCK(sk, uaddr) \
> > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET4_BIND, NULL)
> > -
> > -#define BPF_CGROUP_RUN_PROG_INET6_BIND_LOCK(sk, uaddr) \
> > - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, BPF_CGROUP_INET6_BIND, NULL)
> > +/* BPF_CGROUP_INET4_BIND and BPF_CGROUP_INET6_BIND can return extra flags
> > + * via upper bits of return code. The only flag that is supported
> > + * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check
> > + * should be bypassed.
> > + */
> > +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, type, flags) \
> > +({ \
> > + u32 __flags = 0; \
> > + int __ret = 0; \
> > + if (cgroup_bpf_enabled(type)) { \
> > + lock_sock(sk); \
> > + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, type, \
> > + NULL, &__flags); \
> > + release_sock(sk); \
> > + if (__flags & 1) \
> > + *flags |= BIND_NO_CAP_NET_BIND_SERVICE; \
>
> Nit: It took me some time to realize that there are two different
> "flags": one to pass to __cgroup_bpf_run_filter_sock_addr() and another
> to pass to __inet{,6}_bind/BPF_CGROUP_RUN_PROG_INET_BIND_LOCK that both carry
> "BIND_NO_CAP_NET_BIND_SERVICE" flag but do it differently:
> * hard-coded 0x1 in the former case;
> * and BIND_NO_CAP_NET_BIND_SERVICE == (1 << 3) in the latter.
>
> I'm not sure how to make it more readable: maybe name `flags` and
> `__flags` differently to highlight the difference (`bind_flags` and
> `__flags`?) and add a #define for the "1" here?
>
> In anycase IMO it's not worth a respin and can be addressed by a
> follow-up if you agree.
Yeah, I agree, I didn't stress too much about it because we also
have ret and _ret in BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY
(and now BPF_PROG_RUN_ARRAY_FLAGS), but it looks confusing.
Let me respin with bind_flags, shouldn't be too much work and
can help with the readability in the future. Thanks for the review!
Powered by blists - more mailing lists