[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200506180908.p4rze6ibtuogmf5j@kafai-mbp>
Date: Wed, 6 May 2020 11:09:08 -0700
From: Martin KaFai Lau <kafai@...com>
To: <sdf@...gle.com>
CC: <netdev@...r.kernel.org>, <bpf@...r.kernel.org>,
<davem@...emloft.net>, <ast@...nel.org>, <daniel@...earbox.net>,
Andrey Ignatov <rdna@...com>
Subject: Re: [PATCH bpf-next v2 4/5] bpf: allow any port in bpf_bind helper
On Wed, May 06, 2020 at 09:22:45AM -0700, sdf@...gle.com wrote:
> On 05/05, Martin KaFai Lau wrote:
> > On Tue, May 05, 2020 at 01:27:29PM -0700, Stanislav Fomichev wrote:
> > > We want to have a tighter control on what ports we bind to in
> > > the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
> > > connect() becomes slightly more expensive. The expensive part
> > > comes from the fact that we now need to call inet_csk_get_port()
> > > that verifies that the port is not used and allocates an entry
> > > in the hash table for it.
> > >
> > > Since we can't rely on "snum || !bind_address_no_port" to prevent
> > > us from calling POST_BIND hook anymore, let's add another bind flag
> > > to indicate that the call site is BPF program.
> > >
> > > v2:
> > > * Update documentation (Andrey Ignatov)
> > > * Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov)
> > >
> > > Cc: Andrey Ignatov <rdna@...com>
> > > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > > ---
> > > include/net/inet_common.h | 2 +
> > > include/uapi/linux/bpf.h | 9 +-
> > > net/core/filter.c | 18 ++-
> > > net/ipv4/af_inet.c | 10 +-
> > > net/ipv6/af_inet6.c | 12 +-
> > > tools/include/uapi/linux/bpf.h | 9 +-
> > > .../bpf/prog_tests/connect_force_port.c | 104 ++++++++++++++++++
> > > .../selftests/bpf/progs/connect_force_port4.c | 28 +++++
> > > .../selftests/bpf/progs/connect_force_port6.c | 28 +++++
> > > 9 files changed, 192 insertions(+), 28 deletions(-)
> > > create mode 100644
> > tools/testing/selftests/bpf/prog_tests/connect_force_port.c
> > > create mode 100644
> > tools/testing/selftests/bpf/progs/connect_force_port4.c
> > > create mode 100644
> > tools/testing/selftests/bpf/progs/connect_force_port6.c
> > >
> > > diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> > > index c38f4f7d660a..cb2818862919 100644
> > > --- a/include/net/inet_common.h
> > > +++ b/include/net/inet_common.h
> > > @@ -39,6 +39,8 @@ int inet_bind(struct socket *sock, struct sockaddr
> > *uaddr, int addr_len);
> > > #define BIND_FORCE_ADDRESS_NO_PORT (1 << 0)
> > > /* Grab and release socket lock. */
> > > #define BIND_WITH_LOCK (1 << 1)
> > > +/* Called from BPF program. */
> > > +#define BIND_FROM_BPF (1 << 2)
> > > int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len,
> > > u32 flags);
> > > int inet_getname(struct socket *sock, struct sockaddr *uaddr,
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index b3643e27e264..14b5518a3d5b 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -1994,10 +1994,11 @@ union bpf_attr {
> > > *
> > > * This helper works for IPv4 and IPv6, TCP and UDP sockets. The
> > > * domain (*addr*\ **->sa_family**) must be **AF_INET** (or
> > > - * **AF_INET6**). Looking for a free port to bind to can be
> > > - * expensive, therefore binding to port is not permitted by the
> > > - * helper: *addr*\ **->sin_port** (or **sin6_port**, respectively)
> > > - * must be set to zero.
> > > + * **AF_INET6**). It's advised to pass zero port (**sin_port**
> > > + * or **sin6_port**) which triggers IP_BIND_ADDRESS_NO_PORT-like
> > > + * behavior and lets the kernel reuse the same source port
> > Reading "zero port" and "the same source port" together is confusing.
> Ack, let me try rephrase it a bit to make it more clear.
>
> > > + * as long as 4-tuple is unique. Passing non-zero port might
> > > + * lead to degraded performance.
> > Is the "degraded performance" also true for UDP?
> I suppose everything that is "allocating" port at bind time can lead
> to a faster port exhaustion, so UDP should be also affected.
> Although, looking at udp_v4_get_port, it looks less involved than
> its TCP counterpart.
I was mostly curious after taking a quick look in both TCP and UDP.
It does not affect the patch here. May be something for
optimization later.
>From skimming the codes,
I was thinking UDP should be more or less the same since
they have to go through get_port() eventually.
TCP will be degraded from inet_hash_connect() [which seems
to be using the ehash and 4 tuples] to get_port().
Powered by blists - more mailing lists