lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+7-wE4xr5D9DpH+N-xkL1SB8oVghCKgz+CT5eG1ODQhA@mail.gmail.com>
Date:   Wed, 3 Jun 2020 06:54:56 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Neal Cardwell <ncardwell@...gle.com>
Cc:     Jason Xing <kerneljasonxing@...il.com>,
        David Miller <davem@...emloft.net>,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        netdev <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, liweishi@...ishou.com,
        Shujin Li <lishujin@...ishou.com>
Subject: Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode

On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell <ncardwell@...gle.com> wrote:
>
> On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing <kerneljasonxing@...il.com> wrote:
> > >
> > > Hi Eric,
> > >
> > > I'm still trying to understand what you're saying before. Would this
> > > be better as following:
> > > 1) discard the tcp_internal_pacing() function.
> > > 2) remove where the tcp_internal_pacing() is called in the
> > > __tcp_transmit_skb() function.
> > >
> > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile,
> > > should we introduce the tcp_wstamp_ns socket field as commit
> > > (864e5c090749) does?
> > >
> >
> > Please do not top-post on netdev mailing list.
> >
> >
> > I basically suggested double-checking which point in TCP could end up
> > calling tcp_internal_pacing()
> > while the timer was already armed.
> >
> > I guess this is mtu probing.
>
> Perhaps this could also happen from some of the retransmission code
> paths that don't use tcp_xmit_retransmit_queue()? Perhaps
> tcp_retransmit_timer() (RTO) and  tcp_send_loss_probe() TLP? It seems
> they could indirectly cause a call to __tcp_transmit_skb() and thus
> tcp_internal_pacing() without first checking if the pacing timer was
> already armed?

I feared this, (see recent commits about very low pacing rates) :/

I am not sure we need to properly fix all these points for old
kernels, since EDT model got rid of these problems.

Maybe we can try to extend the timer.

Something like :


diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)

 static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 {
+       struct tcp_sock *tp = tcp_sk(sk);
+       ktime_t expire, now;
        u64 len_ns;
        u32 rate;

@@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk,
const struct sk_buff *skb)

        len_ns = (u64)skb->len * NSEC_PER_SEC;
        do_div(len_ns, rate);
-       hrtimer_start(&tcp_sk(sk)->pacing_timer,
-                     ktime_add_ns(ktime_get(), len_ns),
+
+       now = ktime_get();
+       /* If hrtimer is already armed, then our caller has not
+        * used tcp_pacing_check().
+        */
+       if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) {
+               expire = hrtimer_get_softexpires(&tp->pacing_timer);
+               if (ktime_after(expire, now))
+                       now = expire;
+               if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1)
+                       __sock_put(sk);
+       }
+       hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns),
                      HRTIMER_MODE_ABS_PINNED_SOFT);
        sock_hold(sk);
 }

+static bool tcp_pacing_check(const struct sock *sk)
+{
+       return tcp_needs_internal_pacing(sk) &&
+              hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
+}
+
 static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
 {
        skb->skb_mstamp = tp->tcp_mstamp;
@@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk)
        if (!tcp_can_coalesce_send_queue_head(sk, probe_size))
                return -1;

+       if (tcp_pacing_check(sk))
+               return -1;
+
        /* We're allowed to probe.  Build it now. */
        nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false);
        if (!nskb)
@@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk)
        return -1;
 }

-static bool tcp_pacing_check(const struct sock *sk)
-{
-       return tcp_needs_internal_pacing(sk) &&
-              hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
-}

 /* TCP Small Queues :
  * Control number of packets in qdisc/devices to two packets / or ~1 ms.



>
> neal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ