lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1522228808.8705.42.camel@mbjswglx818>
Date:   Wed, 28 Mar 2018 17:20:08 +0800
From:   Anny Hu <Anny.Hu@...iatek.com>
To:     <netdev@...r.kernel.org>
Subject: Problems: network rx out-of-order issue

Dears,

Recently, we find the following patch will impact multi-core network throughput performance on kernel-4.9,
for it will cause rx packet out-of-order.
commit id: 4cd13c21b207e80ddb1144c576500098f2d5f882

[kernel version]: 
kernel-4.9

[repeat steps]
1. two our phones with load based on kernel-4.9, enable wifi and set up P2P connection; 
2. two reference phones with load based on kernel-4.4, enable wifi and set up P2P connection; 
3. do iperf UDP downlink test with the following commands;
Client: adb shell iperf -c 192.168.49.xx -u -i 1 -t 60 -b 15M
Server: adb shell iperf -s -u
4. out-of-order rate of our phones are obviously higher than reference phone:
DUT                                  Lost rate              out of order
Our phone                            0.011%                  164    
Reference phone                      0.0091%                 1

[Code flow]
Wi-Fi driver trigger rx_thread to receive rx packets and call netif_rx_ni() to push packets to tcp stack in thread context.

[netlog]:
No      Time                            Source ip       Dest ip         Protocol Len     Ip id                   Info
3811    2010-01-01 08:11:35.075522      192.168.43.1    192.168.43.90   UDP      144     0x25ee (9710)           60834 -> 5201 Len=100
3812    2010-01-01 08:11:35.075805      192.168.43.1    192.168.43.90   UDP      144     0x25f5 (9717)           60834 -> 5201 Len=100
3813    2010-01-01 08:11:35.075549      192.168.43.1    192.168.43.90   UDP      144     0x25ef (9711)           60834 -> 5201 Len=100
3814    2010-01-01 08:11:35.075944      192.168.43.1    192.168.43.90   UDP      144     0x25f6 (9718)           60834 -> 5201 Len=100
3815    2010-01-01 08:11:35.075567      192.168.43.1    192.168.43.90   UDP      144     0x25f0 (9712)           60834 -> 5201 Len=100 

out-of-order issue happens on:
1) ip_id=0x25f5 is received before ip_id=0x25ef, which are belong to the same downlink connection;
2) ip_id=0x25f6 is received before ip_id=0x25f0, which are belong to the same downlink connection.


[kernel log]:
We add debug log in rx kernel flow, such as netif_rx_ni() -> enqueue_to_backlog -> do_softirq() -> net_rx_action() -> process_backlog()

    kernel time
    |          cpu id which process running on
    |           |         process name         function name
    |           |         |                    |
[  598.837358] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, pending=0x8a, backlog_state=1

// wifi rx_thread is running on cpu7 to receive rx packets of ip_id 0x25ee ~ 0x25f4, which are only enqueued.
[  598.837366] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ee, input_qlen=1, proc_qlen=0, pending=0x8a, backlog_state=1  
 
// because ksoftirqd is in running queue, do_softirq() exit and net_rx_action isn't executed actually.   
[  598.837373] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1                          

[  598.837383] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25ef,input_qlen=2, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837391] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0 pending=0x8a, ksoftirqd_running=1 
[  598.837400] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f0,input_qlen=3, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837407] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 
[  598.837417] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f1,input_qlen=4, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837424] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 
[  598.837434] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f2,input_qlen=5, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837441] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 
[  598.837450] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f3,input_qlen=6, proc_qlen=0, pending=0x8a, backlog_state=1 
[  598.837458] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1 

// after ip_id(0x25f4), there is no rx packet in this wifi interrupt, so rx_thread exit.
[  598.837470] (7)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f4,input_qlen=7, proc_qlen=0, pending=0x8a, backlog_state=1
[  598.837479] (7)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8a, ksoftirqd_running=1                                  

//  ksoftirqd is scheduled to run and call do_softirq(), so net_rx_action() is executed.    
[  598.837511] (7)[45:ksoftirqd/7]: [RX]net_rx_action: budget=300, time=4295042005  
//  net_rx_action() call process_backlog() to handle rx packet of ip_id=0x25ee                                                                                 
[  598.837646] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ee, delay=126308, work=1, input_qlen=0, proc_qlen=6  

// wifi interrupt happens and trigger rx_thread to receive the next rx packet, which is running on cpu1 this time. 
[  598.837660] (1)[3282:rx_thread]: [RX]enqueue_to_backlog: ip_id=0x25f5, input_qlen=1, proc_qlen=0, pending=0x8, backlog_state=1     
// because ksoftirqd is not running, net_rx softirq is executed immediately.
[  598.837666] (1)[3282:rx_thread]: [RX]do_softirq: preempt=0x1, in_interrupt()=0x0, pending=0x8, ksoftirqd_running =0                                  

// so ksoftirqd on cpu7 and rx_thread on cpu1 are running at the same time.
[  598.837679] (1)[3282:rx_thread]: [RX]net_rx_action: budget=300, time=4295042005  
  
//  ip_id=0x25ef is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.                                                                                      
[  598.837762] (7)[45:ksoftirqd/7]: [RX]process_backlog: ip_id=0x25ef, delay=70385, work=2, input_qlen=0, proc_qlen=5  
 
//  ip_id=0x25f5 is handled on cpu1 by net_rx softirq in rx_thread context, out-of-order issue happens.                                                  
[  598.837768] (1)[3282:rx_thread]: [RX]process_backlog: ip_id=0x25f5, delay=82000, work=1, quota=64, input_qlen=0, proc_qlen=0 
             
//  ip_id=0x25f0 is handled on cpu7 by net_rx softirq in ksoftirqd() thread context.                              
[  598.837878] (7)[45:ksoftirqd/7]: [RX]process_backlog:ip_id=0x25f0, delay=47539, work=3, input_qlen=0, proc_qlen=4   

                               
Could you please help to check if this out-of-order issue is caused by kernel-4.9 patch above?

Thanks very much!


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ