netdev - Re: BUG: sk_backlog.len can overestimate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <47fef079-635d-483e-b530-943b2a55fc22@gmail.com>
Date:   Tue, 1 Oct 2019 11:34:16 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     John Ousterhout <ouster@...stanford.edu>
Cc:     netdev@...r.kernel.org
Subject: Re: BUG: sk_backlog.len can overestimate



On 10/1/19 10:25 AM, John Ousterhout wrote:
> On Tue, Oct 1, 2019 at 9:19 AM Eric Dumazet <eric.dumazet@...il.com> wrote:
>> ...
>> Sorry, I have no idea what is the problem you see.
> 
> OK, let me try again from the start. Consider two values:
> * sk->sk_backlog.len
> * The actual number of bytes in buffers in the current backlog list
> 
> Now consider a series of propositions:
> 
> 1. These two are not always the same. As packets get processed by
> calling sk_backlog_rcv, they are removed from the backlog list, so the
> actual amount of memory consumed by the backlog list drops. However,
> sk->sk_backlog.len doesn't change until the entire backlog is cleared,
> at which point it is reset to zero. So, there can be periods of time
> where sk->sk_backlog.len overstates the actual memory consumption of
> the backlog.

Yes, this is done on purpose (and documented in __release_sock()

Otherwise you could have a livelock situation, with user thread being
trapped forever in system, and never return to user land.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871


> 
> 2. The gap between sk->sk_backlog.len and actual backlog size can grow
> quite large. This happens if new packets arrive while sk_backlog_rcv
> is working. The socket is locked, so these new packets will be added
> to the backlog, which will increase sk->sk_backlog_len. Under high
> load, this could continue indefinitely: packets keep arriving, so the
> backlog never empties, so sk->sk_backlog_len never gets reset.
> However, packets are actually being processed from the backlog, so
> it's possible that the actual size of the backlog isn't changing, yet
> sk->sk_backlog.len continues to grow.
> 
> 3. Eventually, the growth in sk->sk_backlog.len will be limited by the
> "limit" argument to sk_add_backlog. When this happens, packets will be
> dropped.

_Exactly_ WAI

> 
> 4. Now suppose I pass a value of 1000000 as the limit to
> sk_add_backlog. It's possible that sk_add_backlog will reject my
> request even though the backlog only contains a total of 10000 bytes.
> The other 990000 bytes were present on the backlog at one time (though
> not necessarily all at the same time), but they have been processed
> and removed; __release_sock hasn't gotten around to updating
> sk->sk_backlog.len, because it hasn't been able to completely clear
> the backlog.

WAI

> 
> 5. Bottom line: under high load, a socket can be forced to drop
> packets even though it never actually exceeded its memory budget. This
> isn't a case of a sender trying to fool us; we fooled ourselves,
> because of the delay in resetting sk->sk_backlog.len.
> 
> Does this make sense?

Yes, just increase your socket limits. setsockopt(...  SO_RCVBUF ...),
and risk user threads having bigger socket syscall latencies, obviously.

> 
> By the way, I have actually observed this phenomenon in an
> implementation of the Homa transport protocol.
> 

Maybe this transport protocol should size correctly its sockets limits :)