lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3e34c63-d44b-4329-acef-a7adc7024b92@orange.com>
Date: Mon, 17 Jun 2024 16:44:51 +0200
From: alexandre.ferrieux@...nge.com
To: luoxuanqiang <luoxuanqiang@...inos.cn>, <edumazet@...gle.com>
CC: <davem@...emloft.net>, <dsahern@...nel.org>, <fw@...len.de>,
	<kuba@...nel.org>, <netdev@...r.kernel.org>, <pabeni@...hat.com>,
	<kuniyu@...zon.com>
Subject: Re: [PATCH net v2] Fix race for duplicate reqsk on identical SYN

On 17/06/2024 04:53, luoxuanqiang wrote:
> 
> 在 2024/6/17 07:45, alexandre.ferrieux@...nge.com 写道:
>> On 14/06/2024 12:26, luoxuanqiang wrote:
>>> When bonding is configured in BOND_MODE_BROADCAST mode, if two identical
>>> SYN packets are received at the same time and processed on different CPUs,
>>> it can potentially create the same sk (sock) but two different reqsk
>>> (request_sock) in tcp_conn_request().
>>>
>>> These two different reqsk will respond with two SYNACK packets, and since
>>> the generation of the seq (ISN) incorporates a timestamp, the final two
>>> SYNACK packets will have different seq values.
>>>
>>> The consequence is that when the Client receives and replies with an ACK
>>> to the earlier SYNACK packet, we will reset(RST) it.
>>>
>>> ========================================================================
>> This is close, but not identical, to a race we observed on a *single* CPU with
>> the TPROXY iptables target, in the following situation:
>>
>>  - two identical SYNs, sent one second apart from the same client socket,
>>    arrive back-to-back on the interface (due to network jitter)
>>
>>  - they happen to be handled in the same batch of packet from one softirq
>>    name_your_nic_poll()
>>
>>  - there, two loops run sequentially: one for netfilter (doing TPROXY), one
>>    for the network stack (doing TCP processing)
>>
>>  - the first generates two distinct contexts for the two SYNs
>>
>>  - the second respects these contexts and never gets a chance to merge them
>>
>> The result is exactly as you describe, but in this case there is no need for 
>> bonding,
>> and everything happens in one single CPU, which is pretty ironic for a race.
>> My uneducated feeling is that the two loops are the cause of a simulated
>> parallelism, yielding the race. If each packet of the batch was handled
>> "to completion" (full netfilter handling followed immediately by full network
>> stack ingestion), the problem would not exist.
> 
> Based on your explanation, I believe a
> similar issue can occur on a single CPU if two SYN packets are processed
>   closely enough. However, apart from using bond3 mode and having them
> processed on different CPUs to facilitate reproducibility, I haven't
> found a good way to replicate it.
> 
> Could you please provide a more practical example or detailed test
> steps to help me understand the reproduction scenario you mentioned?
> Thank you very much!

To reproduce in my case, I just need the two SYNs to arrive back-to-back in the 
ingress buffer and get in the same softirq run. To reach this goal easily, you 
can set the interrupt coalescence to a large value (like several milliseconds), 
and on the emitter side, send them in rapid sequence from userland. If that's 
not enough, you can just send one and duplicate it with TEE.

Then, if the packets are naturally aimed at the host (normal INPUT chain), I 
can't see the problem (as could be expected as 99.9999% of webservers do just 
this). Quite clearly, tcp_v4_rcv() does a good job in this case and is able to 
link the second packet to the context (reqsk?) of the first one.

But, if packets are "in transit", in a transparent proxying context, with the 
TPROXY target doing a redirection to the local listener, the race happens 
deterministically: I've even managed to squeeze 6 or 7 duplicate packets in the 
softirq run, and all of them get a different ISN !!!

In summary, the minimal setup is just:

    - ethtool -C $ITF rx-usecs 30000
    - a listener bound on port $PO
    - iptables -t mangle -A PREROUTING -i $ITF -p tcp -j TPROXY --on-port $PO 
--on-ip 0.0.0.0


And to get to the specifics, I have the impression that  in  ip_sublist_rcv(), 
the "two-pass" method of calling first NF_HOOK_LIST(), then 
ip_list_rcv_finish(), gets confused as the first pass (NF_HOOK_LIST calling 
TPROXY) does a "half-job" of attaching a context, making all of them different, 
while the second pass retrieves these contexts and doesn't try to "merge" them 
when needed:

   static void ip_sublist_rcv(...)
   {
     NF_HOOK_LIST(NFPROTO_IPV4, NF_INET_PRE_ROUTING, net, NULL,
                  head, dev, NULL, ip_rcv_finish);
     ip_list_rcv_finish(net, NULL, head);
   }

My naive impression is that reducing the "batch" size to 1 would do the job.
In other words, "run each packet to completion", netfilter *and* ip_*_rcv.
But I lack the vision of the big picture leading to the current choice.
Thanks in advance for shedding light on this :)

-Alex

____________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ