[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47fef079-635d-483e-b530-943b2a55fc22@gmail.com>
Date: Tue, 1 Oct 2019 11:34:16 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: John Ousterhout <ouster@...stanford.edu>
Cc: netdev@...r.kernel.org
Subject: Re: BUG: sk_backlog.len can overestimate
On 10/1/19 10:25 AM, John Ousterhout wrote:
> On Tue, Oct 1, 2019 at 9:19 AM Eric Dumazet <eric.dumazet@...il.com> wrote:
>> ...
>> Sorry, I have no idea what is the problem you see.
>
> OK, let me try again from the start. Consider two values:
> * sk->sk_backlog.len
> * The actual number of bytes in buffers in the current backlog list
>
> Now consider a series of propositions:
>
> 1. These two are not always the same. As packets get processed by
> calling sk_backlog_rcv, they are removed from the backlog list, so the
> actual amount of memory consumed by the backlog list drops. However,
> sk->sk_backlog.len doesn't change until the entire backlog is cleared,
> at which point it is reset to zero. So, there can be periods of time
> where sk->sk_backlog.len overstates the actual memory consumption of
> the backlog.
Yes, this is done on purpose (and documented in __release_sock()
Otherwise you could have a livelock situation, with user thread being
trapped forever in system, and never return to user land.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8eae939f1400326b06d0c9afe53d2a484a326871
>
> 2. The gap between sk->sk_backlog.len and actual backlog size can grow
> quite large. This happens if new packets arrive while sk_backlog_rcv
> is working. The socket is locked, so these new packets will be added
> to the backlog, which will increase sk->sk_backlog_len. Under high
> load, this could continue indefinitely: packets keep arriving, so the
> backlog never empties, so sk->sk_backlog_len never gets reset.
> However, packets are actually being processed from the backlog, so
> it's possible that the actual size of the backlog isn't changing, yet
> sk->sk_backlog.len continues to grow.
>
> 3. Eventually, the growth in sk->sk_backlog.len will be limited by the
> "limit" argument to sk_add_backlog. When this happens, packets will be
> dropped.
_Exactly_ WAI
>
> 4. Now suppose I pass a value of 1000000 as the limit to
> sk_add_backlog. It's possible that sk_add_backlog will reject my
> request even though the backlog only contains a total of 10000 bytes.
> The other 990000 bytes were present on the backlog at one time (though
> not necessarily all at the same time), but they have been processed
> and removed; __release_sock hasn't gotten around to updating
> sk->sk_backlog.len, because it hasn't been able to completely clear
> the backlog.
WAI
>
> 5. Bottom line: under high load, a socket can be forced to drop
> packets even though it never actually exceeded its memory budget. This
> isn't a case of a sender trying to fool us; we fooled ourselves,
> because of the delay in resetting sk->sk_backlog.len.
>
> Does this make sense?
Yes, just increase your socket limits. setsockopt(... SO_RCVBUF ...),
and risk user threads having bigger socket syscall latencies, obviously.
>
> By the way, I have actually observed this phenomenon in an
> implementation of the Homa transport protocol.
>
Maybe this transport protocol should size correctly its sockets limits :)
Powered by blists - more mailing lists