lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 14 Jun 2024 18:33:39 +0800
From: luoxuanqiang <luoxuanqiang@...inos.cn>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, dsahern@...nel.org, fw@...len.de, kuba@...nel.org,
 linux-kernel@...r.kernel.org, netdev@...r.kernel.org, pabeni@...hat.com
Subject: Re: [PATCH v1 1/1] Fix race for duplicate reqsk on identical SYN


在 2024/6/14 17:41, Eric Dumazet 写道:
> On Fri, Jun 14, 2024 at 10:59 AM luoxuanqiang <luoxuanqiang@...inos.cn> wrote:
>> On Fri, Jun 14, 2024 at 8:01 AM luoxuanqiang <luoxuanqiang@...inos.cn> wrote:
>>>> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical SYN packets
>>>> are received at the same time and processed on different CPUs, it can potentially
>>>> create the same sk (sock) but two different reqsk (request_sock) in tcp_conn_request().
>>>>
>>>> These two different reqsk will respond with two SYNACK packets, and since the generation
>>>> of the seq (ISN) incorporates a timestamp, the final two SYNACK packets will have
>>>> different seq values.
>>>>
>>>> The consequence is that when the Client receives and replies with an ACK to the earlier
>>>> SYNACK packet, we will reset(RST) it.
>>>>
>>>> ====================================================================================
>>>>
>>>> This behavior is consistently reproducible in my local setup, which comprises:
>>>>
>>>>                    | NETA1 ------ NETB1 |
>>>> PC_A --- bond --- |                    | --- bond --- PC_B
>>>>                    | NETA2 ------ NETB2 |
>>>>
>>>> - PC_A is the Server and has two network cards, NETA1 and NETA2. I have bonded these two
>>>>    cards using BOND_MODE_BROADCAST mode and configured them to be handled by different CPU.
>>>>
>>>> - PC_B is the Client, also equipped with two network cards, NETB1 and NETB2, which are
>>>>    also bonded and configured in BOND_MODE_BROADCAST mode.
>>>>
>>>> If the client attempts a TCP connection to the server, it might encounter a failure.
>>>> Capturing packets from the server side reveals:
>>>>
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [S], seq 320236027,
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855116,
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [S.], seq 2967855123, <==
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>>>> 10.10.10.10.45182 > localhost.localdomain.search-agent: Flags [.], ack 4294967290,
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117, <==
>>>> localhost.localdomain.search-agent > 10.10.10.10.45182: Flags [R], seq 2967855117,
>>>>
>>>> Two SYNACKs with different seq numbers are sent by localhost, resulting in an anomaly.
>>>>
>>>> ====================================================================================
>>>>
>>>> The attempted solution is as follows:
>>>> In the tcp_conn_request(), while inserting reqsk into the ehash table, it also checks
>>>> if an entry already exists. If found, it avoids reinsertion and releases it.
>>>>
>>>> Simultaneously, In the reqsk_queue_hash_req(), the start of the req->rsk_timer is
>>>> adjusted to be after successful insertion.
>>>>
>>>> Signed-off-by: luoxuanqiang <luoxuanqiang@...inos.cn>
>>>> ---
>>>>   include/net/inet_connection_sock.h |  2 +-
>>>>   net/dccp/ipv4.c                    |  2 +-
>>>>   net/dccp/ipv6.c                    |  2 +-
>>>>   net/ipv4/inet_connection_sock.c    | 16 ++++++++++++----
>>>>   net/ipv4/tcp_input.c               | 11 ++++++++++-
>>>>   5 files changed, 25 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
>>>> index 7d6b1254c92d..8773d161d184 100644
>>>> --- a/include/net/inet_connection_sock.h
>>>> +++ b/include/net/inet_connection_sock.h
>>>> @@ -264,7 +264,7 @@ struct sock *inet_csk_reqsk_queue_add(struct sock *sk,
>>>>                                        struct request_sock *req,
>>>>                                        struct sock *child);
>>>>   void inet_csk_reqsk_queue_hash_add(struct sock *sk, struct request_sock *req,
>>>> -                                  unsigned long timeout);
>>>> +                                  unsigned long timeout, bool *found_dup_sk);
>>>>   struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
>>>>                                           struct request_sock *req,
>>>>                                           bool own_req);
>>>> diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
>>>> index ff41bd6f99c3..13aafdeb9205 100644
>>>> --- a/net/dccp/ipv4.c
>>>> +++ b/net/dccp/ipv4.c
>>>> @@ -657,7 +657,7 @@ int dccp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>>>>          if (dccp_v4_send_response(sk, req))
>>>>                  goto drop_and_free;
>>>>
>>>> -       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>>>> +       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>>>>          reqsk_put(req);
>>>>          return 0;
>>>>
>>>> diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
>>>> index 85f4b8fdbe5e..493cdb12ce2b 100644
>>>> --- a/net/dccp/ipv6.c
>>>> +++ b/net/dccp/ipv6.c
>>>> @@ -400,7 +400,7 @@ static int dccp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
>>>>          if (dccp_v6_send_response(sk, req))
>>>>                  goto drop_and_free;
>>>>
>>>> -       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT);
>>>> +       inet_csk_reqsk_queue_hash_add(sk, req, DCCP_TIMEOUT_INIT, NULL);
>>>>          reqsk_put(req);
>>>>          return 0;
>>>>
>>>> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
>>>> index d81f74ce0f02..d9394db98a5a 100644
>>>> --- a/net/ipv4/inet_connection_sock.c
>>>> +++ b/net/ipv4/inet_connection_sock.c
>>>> @@ -1123,12 +1123,17 @@ static void reqsk_timer_handler(struct timer_list *t)
>>>>   }
>>>>
>>>>   static void reqsk_queue_hash_req(struct request_sock *req,
>>>> -                                unsigned long timeout)
>>>> +                                unsigned long timeout, bool *found_dup_sk)
>>>>   {
>>>> +
>>>> +       inet_ehash_insert(req_to_sk(req), NULL, found_dup_sk);
>>>> +       if(found_dup_sk && *found_dup_sk)
>>>> +               return;
>>>> +
>>>> +       /* The timer needs to be setup after a successful insertion. */
>>> I am pretty sure we had a prior attempt to fix this issue, and the fix
>>> was problematic.
>>>
>>> You are moving the inet_ehash_insert() before the mod_timer(), this
>>> will add races.
>> Could you kindly explain what "races" refer to here? Thank you!
>
> Hmmm... maybe this is ok. Please respin your patch after fixing
> checkpatch issues, and add a 'net' tag
>
> ( See https://patchwork.kernel.org/project/netdevbpf/patch/20240614060012.158026-1-luoxuanqiang@kylinos.cn/
> for all warnings / fails)
>
> Please CC Kuniyuki Iwashima <kuniyu@...zon.com> because I will be OOO
> for about 4 days.

The formatting error has been resolved and the V2 version has been sent. Enjoy your holiday! Thank you!


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ