lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <CANn89iJBOAg+KCZBvkUxdAfTS1jacBBcrW6M5AZQvr=UPFJ0dA@mail.gmail.com> (raw)
Date: Fri, 14 Jun 2024 15:56:37 +0800
From: luoxuanqiang <luoxuanqiang@...inos.cn>
To: edumazet@...gle.com
Cc: davem@...emloft.net,
	dsahern@...nel.org,
	fw@...len.de,
	kuba@...nel.org,
	linux-kernel@...r.kernel.org,
	luoxuanqiang@...inos.cn,
	netdev@...r.kernel.org,
	pabeni@...hat.com
Subject: Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN

On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <luoxuanqiang@...inos.cn> wrote:
>>
>> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
>> are received at the same time and processed on different CPUs, it can potentially
>> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
>>
>> These two different reqsk will respond with two SYNACK packets, and since the generation
>> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
>> different seq values.
>>
>> The consequence is that when the Client receives and replies with an ACK to the earlier
>> SYNACK packet, we will reset(RST) it.
>>
>> ====================================================================================
>>
>> This behavior is consistently reproducible in my local setup, which comprises:
>>
>>                   | NETA1 ------ NETB1 |
>> PC_A --- bond --- |                    | --- bond --- PC_B
>>                   | NETA2 ------ NETB2 |
>>
>> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
>>   cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
>>
>> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
>>   also bonded and configured in BOND_MODE_BROADCAST mode.
>>
>> If the client attempts a TCP connection to the server, it might encounter a failure.
>> Capturing packets from the server side reveals:
>>
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
>>
>> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
>>
>> ====================================================================================
>>
>> The attempted solution is as follows:
>> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
>> if an entry already exists. If found, it avoids reinsertion and releases it.
>>
>> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
>> adjusted to be after successful insertion.
>>
>> Signed-off-by: luoxuanqiang <luoxuanqiang@...inos.cn>
>> ---
>>  include/net/inet_connection_sock.h |  2 +-
>>  net/dccp/ipv4.c                    |  2 +-
>>  net/dccp/ipv6.c                    |  2 +-
>>  net/ipv4/inet_connection_sock.c    | 16 ++++++++++++----
>>  net/ipv4/tcp_input.c               | 11 ++++++++++-
>>  5 files changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>> index 7d6b1254c92d..8773d161d184 100644
>> --- a/include/net/inet_connection_sock.h
>> +++ b/include/net/inet_connection_sock.h
>> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
>>                                       struct request_sock *req,
>>                                       struct sock *child);
>>  void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>> -                                  unsigned long timeout);
>> +                                  unsigned long timeout, bool *found_dup_sk);
>>  struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
>>                                          struct request_sock *req,
>>                                          bool own_req);
>> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
>> index ff41bd6f99c3..13aafdeb9205 100644
>> --- a/net/dccp/ipv4.c
>> +++ b/net/dccp/ipv4.c
>> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>>         if (dccp_v4_send_response(sk, req))
>>                 goto drop_and_free;
>>
>> -       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>> +       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>>         reqsk_put(req);
>>         return 0;
>>
>> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
>> index 85f4b8fdbe5e..493cdb12ce2b 100644
>> --- a/net/dccp/ipv6.c
>> +++ b/net/dccp/ipv6.c
>> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
>>         if (dccp_v6_send_response(sk, req))
>>                 goto drop_and_free;
>>
>> -       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>> +       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>>         reqsk_put(req);
>>         return 0;
>>
>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>> index d81f74ce0f02..d9394db98a5a 100644
>> --- a/net/ipv4/inet_connection_sock.c
>> +++ b/net/ipv4/inet_connection_sock.c
>> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
>>  }
>>
>>  static void reqsk_queue_hash_req(struct request_sock *req,
>> -                                unsigned long timeout)
>> +                                unsigned long timeout, bool *found_dup_sk)
>>  {
>> +
>> +       inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
>> +       if(found_dup_sk && *found_dup_sk)
>> +               return;
>> +
>> +       /* The timer needs to be setup after a successful insertion. */
>
>I am pretty sure we had a prior attempt to fix this issue, and the fix
>was problematic.
>
>You are moving the inet_ehash_insert() before the mod_timer(), this
>will add races.
Could you kindly explain what "races" refer to here? Thank you!

>
>Hint here is the use of TIMER_PINNED.
>
>CCing Florian, because he just removed TIMER_PINNED for TW, he might
>have the context
>to properly fix this issue.
>
>>         timer_setup(&req->rsk_timer, reqsk_timer_handler, TIMER_PINNED);
>>         mod_timer(&req->rsk_timer, jiffies + timeout);
>>
>> -       inet_ehash_insert(req_to_sk(req), NULL, NULL);
>>         /* before letting lookups find us, make sure all req fields
>>          * are committed to memory and refcnt initialized.
>>          */
>> @@ -1137,9 +1142,12 @@ static void reqsk_queue_hash_req(struct request_sock *req,
>>  }
>>
>>  void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>> -                                  unsigned long timeout)
>> +                                  unsigned long timeout, bool *found_dup_sk)
>>  {
>> -       reqsk_queue_hash_req(req, timeout);
>> +       reqsk_queue_hash_req(req, timeout, found_dup_sk);
>> +       if(found_dup_sk && *found_dup_sk)
>> +               return;
>> +
>>         inet_csk_reqsk_queue_added(sk);
>>  }
>>  EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index 9c04a9c8be9d..467f1b7bbd5a 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -7255,8 +7255,17 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>>         } else {
>>                 tcp_rsk(req)->tfo_listener = false;
>>                 if (!want_cookie) {
>> +                       bool found_dup_sk = false;
>> +
>>                         req->timeout = tcp_timeout_init((struct sock *)req);
>> -                       inet_csk_reqsk_queue_hash_add(sk, req, req->timeout);
>> +                       inet_csk_reqsk_queue_hash_add(sk, req, req->timeout,
>> +                                                       &found_dup_sk);
>> +
>> +                       if(unlikely(found_dup_sk)){
>> +                               reqsk_free(req);
>> +                               return 0;
>> +                       }
>> +
>>                 }
>>                 af_ops->send_synack(sk, dst, &fl, req, &foc,
>>                                     !want_cookie ? TCP_SYNACK_NORMAL :
>> --
>> 2.25.1
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ