netdev - Fwd: BUG: sk_backlog.len can overestimate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAGXJAmz5aDcEv_t18uqPHpBK6sJaZaN-_NyfPJcyPGMCm63rsw@mail.gmail.com>
Date:   Tue, 1 Oct 2019 08:46:31 -0700
From:   John Ousterhout <ouster@...stanford.edu>
To:     netdev@...r.kernel.org
Subject: Fwd: BUG: sk_backlog.len can overestimate

(I accidentally dropped netdev on my earlier message... here is Eric's
response, which also didn't go to the group)

---------- Forwarded message ---------
From: Eric Dumazet <eric.dumazet@...il.com>
Date: Mon, Sep 30, 2019 at 6:53 PM
Subject: Re: BUG: sk_backlog.len can overestimate
To: John Ousterhout <ouster@...stanford.edu>

On 9/30/19 5:41 PM, John Ousterhout wrote:
> On Mon, Sep 30, 2019 at 5:14 PM Eric Dumazet <eric.dumazet@...il.com> wrote:
>>
>>
>>
>> On 9/30/19 4:58 PM, John Ousterhout wrote:
>>> As of 4.16.10, it appears to me that sk->sk_backlog_len does not
>>> provide an accurate estimate of backlog length; this reduces the
>>> usefulness of the "limit" argument to sk_add_backlog.
>>>
>>> The problem is that, under heavy load, sk->sk_backlog_len can grow
>>> arbitrarily large, even though the actual amount of data in the
>>> backlog is small. This happens because __release_sock doesn't reset
>>> the backlog length until it gets completely caught up. Under heavy
>>> load, new packets can be arriving continuously  into the backlog
>>> (which increases sk_backlog.len) while other packets are being
>>> serviced. This can go on forever, so sk_backlog.len never gets reset
>>> and it can become arbitrarily large.
>>
>> Certainly not.
>>
>> It can not grow arbitrarily large, unless a backport gone wrong maybe.
>
> Can you help me understand what would limit the growth of this value?
> Suppose that new packets are arriving as quickly as they are
> processed. Every time __release_sock calls sk_backlog_rcv, a new
> packet arrives during the call, which is added to the backlog,
> incrementing sk_backlog.len. However, sk_backlog_len doesn't get
> decreased when sk_backlog_rcv completes, since the backlog hasn't
> emptied (as you said, it's not "safe"). As a result, sk_backlog.len
> has increased, but the actual backlog length is unchanged (one packet
> was added, one was removed). Why can't this process repeat
> indefinitely, until eventually sk_backlog.len reaches whatever limit
> the transport specifies when it invokes sk_add_backlog? At this point
> packets will be dropped by the transport even though the backlog isn't
> actually very large.

The process is bounded by socket sk_rcvbuf + sk_sndbuf

bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
{
        u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;

        ...
        if (unlikely(sk_add_backlog(sk, skb, limit))) {
            ...
            __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
        ...
}


Once the limit is reached, sk_backlog.len wont be touched, unless
__release_sock()
has processed the whole queue.


>
>>>
>>> Because of this, the "limit" argument to sk_add_backlog may not be
>>> useful, since it could result in packets being discarded even though
>>> the backlog is not very large.
>>>
>>
>>
>> You will have to study git log/history for the details, the limit _is_ useful,
>> and we reset the limit in __release_sock() only when _safe_.
>>
>> Assuming you talk about TCP, then I suggest you use a more recent kernel.
>>
>> linux-5.0 got coalescing in the backlog queue, which helped quite a bit.