lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 23 Jan 2018 15:36:04 -0800
From:   Yuchung Cheng <ycheng@...gle.com>
To:     Lawrence Brakmo <brakmo@...com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Neal Cardwell <ncardwell@...gle.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "soheil@...gle.com" <soheil@...gle.com>
Subject: Re: [PATCH net] bpf: always re-init the congestion control after
 switching to it

On Tue, Jan 23, 2018 at 3:30 PM, Lawrence Brakmo <brakmo@...com> wrote:
>
>
>
> On 1/23/18, 3:26 PM, "Alexei Starovoitov" <alexei.starovoitov@...il.com> wrote:
>
>     On Tue, Jan 23, 2018 at 08:19:54PM +0000, Lawrence Brakmo wrote:
>     > On 1/23/18, 11:50 AM, "Eric Dumazet" <eric.dumazet@...il.com> wrote:
>     >
>     >     On Tue, 2018-01-23 at 14:39 -0500, Neal Cardwell wrote:
>     >     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@...com> wrote:
>     >     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@...gle.com> wrote:
>     >     > >
>     >     > >     The original patch that changes TCP's congestion control via eBPF only
>     >     > >     re-initializes the new congestion control, if the BPF op is set to an
>     >     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     >     > >
>     >     > > What do you mean by “(invalid) value”?
>     >     > >
>     >     > >     run the new congestion control from random states.
>     >     > >
>     >     > > This has always been possible with setsockopt, no?
>     >     > >
>     >     > >    This patch fixes
>     >     > >     the issue by always re-init the congestion control like other means
>     >     > >     such as setsockopt and sysctl changes.
>     >     > >
>     >     > > The current code re-inits the congestion control when calling
>     >     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     >     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     >     > > since it will be initialized later by TCP when the connection is established.
>     >     > >
>     >     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     >     > > tcp_cleanup_congestion_control when the congestion control has not been
>     >     > > initialized yet.
>     >     >
>     >     > On Tue, Jan 23, 2018 at 2:20 PM, Lawrence Brakmo <brakmo@...com> wrote:
>     >     > > On 1/23/18, 9:30 AM, "Yuchung Cheng" <ycheng@...gle.com> wrote:
>     >     > >
>     >     > >     The original patch that changes TCP's congestion control via eBPF only
>     >     > >     re-initializes the new congestion control, if the BPF op is set to an
>     >     > >     (invalid) value beyond BPF_SOCK_OPS_NEEDS_ECN. Consequently TCP will
>     >     > >
>     >     > > What do you mean by “(invalid) value”?
>     >     > >
>     >     > >     run the new congestion control from random states.
>     >     > >
>     >     > > This has always been possible with setsockopt, no?
>     >     > >
>     >     > >    This patch fixes
>     >     > >     the issue by always re-init the congestion control like other means
>     >     > >     such as setsockopt and sysctl changes.
>     >     > >
>     >     > > The current code re-inits the congestion control when calling
>     >     > > tcp_set_congestion_control except when it is called early on (i.e. op <=
>     >     > > BPF_SOCK_OPS_NEEDS_ECN). In that case there is no need to re-initialize
>     >     > > since it will be initialized later by TCP when the connection is established.
>     >     > >
>     >     > > Otherwise, if we always call tcp_reinit_congestion_control we would call
>     >     > > tcp_cleanup_congestion_control when the congestion control has not been
>     >     > > initialized yet.
>     >     >
>     >     > Interesting. So I wonder if the symptoms we were seeing were due to
>     >     > kernels that did not yet have this fix:
>     >     >
>     >     >   27204aaa9dc6 ("tcp: uniform the set up of sockets after successful
>     >     > connection):
>     >     >   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=27204aaa9dc67b833b77179fdac556288ec3a4bf
>     >     >
>     >     > Before that fix, there could be TFO passive connections that at SYN time called:
>     >     >   tcp_init_congestion_control(child);
>     >     > and then:
>     >     >   tcp_call_bpf(child, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB);
>     >     >
>     >     > So that if the CC was switched in the
>     >     > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB handler then there would be no
>     >     > init for the new module?
>     >
>     >
>     >     Note that bpf_sock->op can be written by a malicious BPF filter.
>     >
>     >     So, a malicious filter can switch from Cubic to BBR without re-init,
>     >     and bad things can happen.
>     >
>     >     I do not believe we should trust BPF here.
>     >
>     > Very good point Eric. One solution would be to make bpf_sock->op not writeable by
>     > the BPF program.
>     >
>     > Neal, you are correct that would have been a problem. I leave it up to you guys whether
>     > making bpf_sock->op not writeable by BPF program is enough or if it is safer to always
>     > re-init (as long as there is no problem calling tcp_cleanup_congestion_control when it
>     > has not been initialized.
>
>     I think allowing write into 'op' and 'replylong' was a mistake.
>     Only 'reply' field is used by tcp_call_bpf().
>     No reason to let programs write into other fields.
>     I think we have to fix it now before programs start to rely
>     on this undefined behavior.
>
> write into ‘op’ is a mistake. Writing to replylong is a mistake until we have a calling op
> that uses the longer reply. I will do a patch to fix this once my outstanding patch is
> accepted since otherwise I would need to update my current patch.
That'd be great. Thanks Larry. IMO conditioning reinit TCP C.C. module on bpf op
is also worth addressing. Otherwise that's just inviting future bugs.
So I am still interested in addressing
that (unless you'd like to address that too)

>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ