netdev - Re: BUG: sk_backlog.len can overestimate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGXJAmzHvKzKb1wzxtZK_KCu-pEQghznM4qmfzYmWeWR1CaJ7Q@mail.gmail.com>
Date:   Tue, 1 Oct 2019 10:25:21 -0700
From:   John Ousterhout <ouster@...stanford.edu>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     netdev@...r.kernel.org
Subject: Re: BUG: sk_backlog.len can overestimate

On Tue, Oct 1, 2019 at 9:19 AM Eric Dumazet <eric.dumazet@...il.com> wrote:
> ...
> Sorry, I have no idea what is the problem you see.

OK, let me try again from the start. Consider two values:
* sk->sk_backlog.len
* The actual number of bytes in buffers in the current backlog list

Now consider a series of propositions:

1. These two are not always the same. As packets get processed by
calling sk_backlog_rcv, they are removed from the backlog list, so the
actual amount of memory consumed by the backlog list drops. However,
sk->sk_backlog.len doesn't change until the entire backlog is cleared,
at which point it is reset to zero. So, there can be periods of time
where sk->sk_backlog.len overstates the actual memory consumption of
the backlog.

2. The gap between sk->sk_backlog.len and actual backlog size can grow
quite large. This happens if new packets arrive while sk_backlog_rcv
is working. The socket is locked, so these new packets will be added
to the backlog, which will increase sk->sk_backlog_len. Under high
load, this could continue indefinitely: packets keep arriving, so the
backlog never empties, so sk->sk_backlog_len never gets reset.
However, packets are actually being processed from the backlog, so
it's possible that the actual size of the backlog isn't changing, yet
sk->sk_backlog.len continues to grow.

3. Eventually, the growth in sk->sk_backlog.len will be limited by the
"limit" argument to sk_add_backlog. When this happens, packets will be
dropped.

4. Now suppose I pass a value of 1000000 as the limit to
sk_add_backlog. It's possible that sk_add_backlog will reject my
request even though the backlog only contains a total of 10000 bytes.
The other 990000 bytes were present on the backlog at one time (though
not necessarily all at the same time), but they have been processed
and removed; __release_sock hasn't gotten around to updating
sk->sk_backlog.len, because it hasn't been able to completely clear
the backlog.

5. Bottom line: under high load, a socket can be forced to drop
packets even though it never actually exceeded its memory budget. This
isn't a case of a sender trying to fool us; we fooled ourselves,
because of the delay in resetting sk->sk_backlog.len.

Does this make sense?

By the way, I have actually observed this phenomenon in an
implementation of the Homa transport protocol.