[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191204004654.GA22999@linux.home>
Date: Wed, 4 Dec 2019 01:46:54 +0100
From: Guillaume Nault <gnault@...hat.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: David Miller <davem@...emloft.net>,
Jakub Kicinski <jakub.kicinski@...ronome.com>,
netdev <netdev@...r.kernel.org>, Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH net] tcp: Avoid time_after32() underflow when handling
syncookies
On Mon, Dec 02, 2019 at 02:23:35PM -0800, Eric Dumazet wrote:
> On Mon, Dec 2, 2019 at 1:51 PM Guillaume Nault <gnault@...hat.com> wrote:
> >
> > On Thu, Nov 28, 2019 at 02:04:19PM -0800, Eric Dumazet wrote:
> > > On Thu, Nov 28, 2019 at 1:36 PM Guillaume Nault <gnault@...hat.com> wrote:
> > > >
> > > > In tcp_synq_overflow() and tcp_synq_no_recent_overflow(), the
> > > > time_after32() call might underflow and return the opposite of the
> > > > expected result.
> > > >
> > > > This happens after socket initialisation, when ->synq_overflow_ts and
> > > > ->rx_opt.ts_recent_stamp are still set to zero. In this case, they
> > > > can't be compared reliably to the current value of jiffies using
> > > > time_after32(), because jiffies may be too far apart (especially soon
> > > > after system startup, when it's close to 2^32).
> > > >
> > > > In such a situation, the erroneous time_after32() result prevents
> > > > tcp_synq_overflow() from updating ->synq_overflow_ts and
> > > > ->rx_opt.ts_recent_stamp, so the problem remains until jiffies wraps
> > > > and exceeds HZ.
> > > >
> > > > Practical consequences should be quite limited though, because the
> > > > time_after32() call of tcp_synq_no_recent_overflow() would also
> > > > underflow (unless jiffies wrapped since the first time_after32() call),
> > > > thus detecting a socket overflow and triggering the syncookie
> > > > verification anyway.
> > > >
> > > > Also, since commit 399040847084 ("bpf: add helper to check for a valid
> > > > SYN cookie") and commit 70d66244317e ("bpf: add bpf_tcp_gen_syncookie
> > > > helper"), tcp_synq_overflow() and tcp_synq_no_recent_overflow() can be
> > > > triggered from BPF programs. Even though such programs would normally
> > > > pair these two operations, so both underflows would compensate each
> > > > other as described above, we'd better avoid exposing the problem
> > > > outside of the kernel networking stack.
> > > >
> > > > Let's fix it by initialising ->rx_opt.ts_recent_stamp and
> > > > ->synq_overflow_ts to a value that can be safely compared to jiffies
> > > > using time_after32(). Use "jiffies - TCP_SYNCOOKIE_VALID - 1", to
> > > > indicate that we're not in a socket overflow phase.
> > > >
> > >
> > > A listener could be live for one year, and flip its ' I am under
> > > synflood' status every 24 days (assuming HZ=1000)
> > >
> > > You only made sure the first 24 days are ok, but the problem is still there.
> > >
> > > We need to refresh the values, maybe in tcp_synq_no_recent_overflow()
> > >
> > Yes, but can't we refresh it in tcp_synq_overflow() instead? We
> > basically always want to update the timestamp, unless it's already in
> > the [last_overflow, last_overflow + HZ] interval:
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index 36f195fb576a..1a3d76dafba8 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -494,14 +494,16 @@ static inline void tcp_synq_overflow(const struct sock *sk)
> > reuse = rcu_dereference(sk->sk_reuseport_cb);
> > if (likely(reuse)) {
> > last_overflow = READ_ONCE(reuse->synq_overflow_ts);
> > - if (time_after32(now, last_overflow + HZ))
> > + if (time_before32(now, last_overflow) ||
> > + time_after32(now, last_overflow + HZ))
> > WRITE_ONCE(reuse->synq_overflow_ts, now);
> > return;
> > }
> > }
> >
> > last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
> > - if (time_after32(now, last_overflow + HZ))
> > + if (time_before32(now, last_overflow) ||
> > + time_after32(now, last_overflow + HZ))
> > tcp_sk(sk)->rx_opt.ts_recent_stamp = now;
> > }
> >
> > This way, tcp_synq_no_recent_overflow() should always have a recent
> > timestamp to work on, unless tcp_synq_overflow() wasn't called. But I
> > can't see this case happening for a legitimate connection (unless I've
> > missed something of course).
> >
> > One could send an ACK without a SYN and get into this situation, but
> > then the timestamp value doesn't have too much importance since we have
> > to drop the connection anyway. So, even though an expired timestamp
> > could let the packet pass the tcp_synq_no_recent_overflow() test, the
> > syncookie validation would fail. So the packet is correctly rejected in
> > any case.
>
> But the bug never has been that we could accept an invalid cookie
> (this is unlikely because of the secret matching)
>
I didn't mean that it was. Maybe I wasn't clear, but this last
paragraph was there just to explain why I didn't address the stray
ACK scenario: such packets are correctly rejected anyway (yes we could
be more efficient at it, but that wasn't the original purpose of the
patch).
> The bug was that even if we have not sent recent SYNCOOKIES, we would
> still try to validate a cookie when a stray packet comes.
>
I realise that I misunderstood your original answer. I was looking at
this problem from a different angle: my original intend was to fix the
legitimate syncookie packet case.
Stale last_overflow timestamps can cause tcp_synq_overflow() to reject
valid packets in the following situation:
* tcp_synq_overflow() is called while jiffies is not within the
(last_overflow + HZ, last_overflow + HZ + 2^31] interval. So the
stale last_overflow timestamp isn't updated because
time_after32(jiffies, last_overflow + HZ) returns false.
* Then tcp_synq_no_recent_overflow() is called, jiffies might have
increased a bit, but can still be in the (last_overflow + TCP_SYNCOOKIE_VALID,
last_overflow + TCP_SYNCOOKIE_VALID + 2^31] interval. If so,
time_after32(jiffies, last_overflow + TCP_SYNCOOKIE_VALID) returns
true and the packet is rejected.
The case is unlikely, and whenever it happens, it can't last more than
two minutes. But the problem is now exposed to BPF programs and the fix
is small, so I think it's worth considering it. Admittedly my original
submission wasn't complete. Checking that jiffies is outside of the
[last_overflow, last_overflow + HZ] interval, as in my second proposal,
should fix the problem completely.
> In the end, the packet would be dropped anyway, but we could drop the
> packet much faster.
>
> Updating timestamps in tcp_synq_overflow() is not going to work if we
> never call it :)
>
> I believe tcp_synq_no_recent_overflow() is the right place to add the fix.
>
Are you talking about a flood of stray ACKs (just to be sure we're on
the same page here)? If so, I guess tcp_synq_no_recent_overflow() is
our only chance to update last_overflow. But do we really need it?
I think tcp_synq_no_recent_overflow() should first check if jiffies
isn't within the [last_overflow, last_overflow + TCP_SYNCOOKIE_VALID]
interval. Currently, we just test that it's not within
(last_overflow + TCP_SYNCOOKIE_VALID - 2^31, last_overflow + TCP_SYNCOOKIE_VALID].
That leaves a lot of room for stale timestamps to erroneously trigger
syncookie verification. By reducing the interval, we'd make sure that
only values that are undistinguishable from accurate fresh overflow
timestamps can trigger syncookie verification.
The only bad scenario that'd remain is if a stray ACK flood happens
when last_overflow is stale and jiffies is within the
(last_overflow, last_overflow + TCP_SYNCOOKIE_VALID) interval. If the
flood starts while we're in this state, then I fear that we can't do
anything but to wait for jiffies to exeed the interval's upper bound
(2 minutes max) and to rely on syncookies verification to reject stray
ACKs in the mean time. For cases where the flood starts before
jiffies enters that interval, then refreshing last_overflow in
tcp_synq_no_recent_overflow() would prevent jiffies from entering the
interval for the duration of the flood. last_overflow would have to be
set TCP_SYNCOOKIE_VALID seconds in the past. But since we have
no synchronisation with tcp_synq_overflow(), we'd have a race where
tcp_synq_overflow() would refresh last_overflow (entering
syncookie validation mode) and tcp_synq_no_recent_overflow() would
immediately move it backward (leaving syncookie validation mode). We'd
have to wait for the next SYN to finally reach a stable state with
syncookies validation re-enabled. This is a bit far fetched, and
requires a stray ACK arriving at the same moment a SYN flood is
detected. Still, I feel that moving last_overflow backward in
tcp_synq_no_recent_overflow() is a bit dangerous.
Powered by blists - more mailing lists