[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoC3TuZPTwnHTDvXC+JPoJbgW2UywZ2=xv=E=utokb3pCQ@mail.gmail.com>
Date: Thu, 20 Feb 2025 11:04:15 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Wang Hai <wanghai38@...wei.com>
Cc: Eric Dumazet <edumazet@...gle.com>, ncardwell@...gle.com, kuniyu@...zon.com,
davem@...emloft.net, dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com,
horms@...nel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] tcp: Fix error ts_recent time during three-way handshake
On Wed, Feb 19, 2025 at 9:11 PM Wang Hai <wanghai38@...wei.com> wrote:
>
>
>
> On 2025/2/19 11:31, Jason Xing wrote:
> > On Wed, Feb 19, 2025 at 10:16 AM Wang Hai <wanghai38@...wei.com> wrote:
> >>
> >>
> >>
> >> On 2025/2/18 21:35, Eric Dumazet wrote:
> >>> On Tue, Feb 18, 2025 at 12:00 PM Wang Hai <wanghai38@...wei.com> wrote:
> >>>>
> >>>> If two ack packets from a connection enter tcp_check_req at the same time
> >>>> through different cpu, it may happen that req->ts_recent is updated with
> >>>> with a more recent time and the skb with an older time creates a new sock,
> >>>> which will cause the tcp_validate_incoming check to fail.
> >>>>
> >>>> cpu1 cpu2
> >>>> tcp_check_req
> >>>> tcp_check_req
> >>>> req->ts_recent = tmp_opt.rcv_tsval = t1
> >>>> req->ts_recent = tmp_opt.rcv_tsval = t2
> >>>>
> >>>> newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> >>>> tcp_child_process
> >>>> tcp_rcv_state_process
> >>>> tcp_validate_incoming
> >>>> tcp_paws_check
> >>>> if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // failed
> >>>>
> >>>> In tcp_check_req, restore ts_recent to this skb's to fix this bug.
> >>>>
> >>>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> >>>> Signed-off-by: Wang Hai <wanghai38@...wei.com>
> >>>> ---
> >>>> net/ipv4/tcp_minisocks.c | 4 ++++
> >>>> 1 file changed, 4 insertions(+)
> >>>>
> >>>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> >>>> index b089b08e9617..0208455f9eb8 100644
> >>>> --- a/net/ipv4/tcp_minisocks.c
> >>>> +++ b/net/ipv4/tcp_minisocks.c
> >>>> @@ -878,6 +878,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> >>>> sock_rps_save_rxhash(child, skb);
> >>>> tcp_synack_rtt_meas(child, req);
> >>>> *req_stolen = !own_req;
> >>>> + if (own_req && tcp_sk(child)->rx_opt.tstamp_ok &&
> >>>> + unlikely(tcp_sk(child)->rx_opt.ts_recent != tmp_opt.rcv_tsval))
> >>>> + tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
> >>>> +
> >>>> return inet_csk_complete_hashdance(sk, child, req, own_req);
> >>>
> >>> Have you seen the comment at line 818 ?
> >>>
> >>> /* TODO: We probably should defer ts_recent change once
> >>> * we take ownership of @req.
> >>> */
> >>>
> >>> Plan was clear and explained. Why implement something else (and buggy) ?
> >>>
> >> Hi Eric,
> >>
> >> Currently we have a real problem, so we want to solve it. This bug
> >> causes the upper layers to be unable to be notified to call accept after
> >> the successful three-way handshake.
> >>
> >> Skb from cpu1 that fails at tcp_paws_check (which it could have
> >> succeeded) will not be able to enter the TCP_ESTABLISHED state, and
> >> therefore parent->sk_data_ready(parent) will not be triggered, and skb
> >> from cpu2 can complete the three-way handshake, but there is also no way
> >> to call parent->sk_data_ready(parent) to notify the upper layer, which
> >> will result
> >> in the upper layer not being able to sense and call accept to obtain the
> >> nsk.
> >>
> >> cpu1 cpu2
> >> tcp_check_req
> >> tcp_check_req
> >> req->ts_recent = tmp_opt.rcv_tsval = t1
> >> req->ts_recent=tmp_opt.rcv_tsval= t2
> >>
> >> newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> >> tcp_child_process
> >> tcp_rcv_state_process
> >> tcp_validate_incoming
> >> tcp_paws_check // failed
> >> parent->sk_data_ready(parent); // will not be called
> >> tcp_v4_do_rcv
> >> tcp_rcv_state_process // Complete the three-way handshake
> >> // missing parent->sk_data_ready(parent);
> >
> > IIUC, the ack received from cpu1 triggered calling
> > inet_csk_complete_hashdance() so its state transited from
> > TCP_NEW_SYN_RECV to TCP_SYN_RECV, right? If so, the reason why not
> > call sk_data_ready() if the skb entered into tcp_child_process() is
> > that its state failed to transit to TCP_ESTABLISHED?
> >
> Yes, because it didn't switch to TCP_ESTABLISHED
> > Here is another question. How did the skb on the right side enter into
> > tcp_v4_do_rcv() after entering tcp_check_req() if the state of sk
> > which the skb belongs to is TCP_NEW_SYN_RECV? Could you elaborate more
> > on this point?
> Since cpu1 successfully created the child sock, cpu2 will return
> null in tcp_check_req and req_stolen is set to true, so that it will
> subsequently go to 'goto lookup' to re-process the packet, and at
> this point, sk->sk_state is already in TCP_SYN_RECV state, and then
> then tcp_v4_do_rcv is called.
Now I can see what happened there. Perhaps it would be good to update
the commit message
in the next iteration.
Another key information I notice is that the second lookup process
loses the chance to call sk_data_ready() for its parent socket. It's
the one of the main reasons that cause your application to be unable
to get notified. Taking a rough look at tcp_rcv_state_process(), I
think it's not easy to acquire the parent socket there and then call
sk_data_ready() without modifying more codes compared to the current
solution. It's a different solution in theory.
If your new approach (like your previous reply) works, the following
commit[1] will be reverted/overwritten.
commit e0f9759f530bf789e984961dce79f525b151ecf3
Author: Eric Dumazet <edumazet@...gle.com>
Date: Tue Feb 13 06:14:12 2018 -0800
tcp: try to keep packet if SYN_RCV race is lost
배석진 reported that in some situations, packets for a given 5-tuple
end up being processed by different CPUS.
This involves RPS, and fragmentation.
배석진 is seeing packet drops when a SYN_RECV request socket is
moved into ESTABLISH state. Other states are protected by socket lock.
This is caused by a CPU losing the race, and simply not caring enough.
Since this seems to occur frequently, we can do better and perform
a second lookup.
Note that all needed memory barriers are already in the existing code,
thanks to the spin_lock()/spin_unlock() pair in inet_ehash_insert()
and reqsk_put(). The second lookup must find the new socket,
unless it has already been accepted and closed by another cpu.
Note that the fragmentation could be avoided in the first place by
use of a correct TCP MSS option in the SYN{ACK} packet, but this
does not mean we can not be more robust.
Please repost a V2 when you think it's ready, then TCP maintainers
will handle it:)
Thanks,
Jason
Powered by blists - more mailing lists