[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <22c30f70a632afb65b6cb2a7554e919673d48871.camel@redhat.com>
Date: Fri, 14 Jul 2023 09:56:06 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Stanislav Fomichev <sdf@...gle.com>, Alexei Starovoitov
<alexei.starovoitov@...il.com>
Cc: Geliang Tang <geliang.tang@...e.com>, Alexei Starovoitov
<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, John Fastabend
<john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>, Martin
KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, Yonghong Song
<yhs@...com>, KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, bpf <bpf@...r.kernel.org>, MPTCP Upstream
<mptcp@...ts.linux.dev>, netdev@...r.kernel.org
Subject: Re: [RFC bpf-next 0/8] BPF 'force to MPTCP'
On Thu, 2023-07-13 at 16:09 -0700, Stanislav Fomichev wrote:
> On 07/13, Alexei Starovoitov wrote:
> > imo all 3 options including this 4th one are too hacky.
> > I understand ld_preload limitations and desire to have it per-cgroup,
> > but messing this much with user space feels a little bit too much.
> > What side effects will it cause?
>
> Maybe all that is really needed is some new per-netns sysctl to automatically
> upgrade from IPPROTO_TCP to IPPROTO_MPTCP? Or is it too broad of a
> brush?
I think it would be actually too broad, see below...
> I've also CC'd netdev for visibility...
>
> > Meaning is this enough to just change the proto?
> > Nothing in user space later on needs to be aware the protocol is so different?
>
> IIUC, if you use IPPROTO_MPTCP, you just get regular TCP until you start
> adding extra routes (via netlink). That's why their current
> unconditional IPPROTO_TCP->IPPROTO_MPTCP rewrite via ld_preload also somewhat
> works.
FTR, it the other way around: when using IPPROTO_MPTCP you always get
MPTCP protocol handshake that downgrade gracefully to TCP if the peer
does not support it. Then multiple paths can be added/enabled by
different means, but that is another matter - a quite orthogonal one.
The transition to TCP in currently not completely for free: active
(client) MPTCP sockets fallen-back to TCP will keep some overhead vs
plain TCP ones.
Being able to control the IPPROTO_TCP->IPPROTO_MPTCP change on per
socket basis do offer some advantages e.g. constraining the change to
the sockets that are likely to complete successfully the MPTCP
handshake.
> > I feel the consequences are too drastic to support such thing
> > through an official/stable hook.
> > We can consider an fmod_ret unstable hook somewhere in the kernel
> > that bpf prog can attach to and tweak the ret value and/or args,
> > but the production environment won't be using it.
> > It will be a temporary gap until user space is properly converted to mptcp.
>
> Asking every app to do s/IPPROTO_TCP/IPPROTO_MPTCP/ might be annoying
> though? (don't have a horse in this race, but have some v4->v6 migration
> vibes from this)
I can do only wild guesses, but I also expect such "transition" to be
extremely long and/or incomplete.
Cheers,
Paolo
Powered by blists - more mailing lists