lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1401885281.3645.245.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Wed, 04 Jun 2014 05:34:41 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Suprasad Mutalik Desai <suprasad.desai@...il.com>
Cc:	netdev@...r.kernel.org, davem@...emloft.ne
Subject: Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel
 in routed scenario

On Wed, 2014-06-04 at 14:34 +0530, Suprasad Mutalik Desai wrote:
> Hi,
> 
> 
>     Currently i am working on 3.10.12 kernel and it seems the Linux
> stack performance (TCP and UDP) has degraded drastically as compared
> to 2.6 kernel.
> 
> Results :
> 
> Linux 2.6.32
> ---------------------
> TCP traffic using iperf
>     - Upstream : 140 Mbps
>     - Downstream : 148 Mbps
> 
> UDP traffic using iperf
>     - Upstream : 200 Mbps
>     - Downstream : 245 Mbps
> 
> Linux 3.10.12
> --------------------
> TCP traffic using iperf
>     - Upstream : 101 Mbps
>     - Downstream : 106 Mbps
> 
> UDP traffic using iperf
>     - Upstream : 140 Mbps
>     - Downstream : 170 Mbps
> 
> Analysis:
> ---------------
> 1.   As per profiling data on Linux-3.10.12 it seems,
>              -   fib_table_lookup and ip_route_input_noref is being
> called most of the times and thus causing the degradation in
> performance.
> 
>     8.77    csum_partial 0x80009A20 1404

Main problem here is lack of checksums. What kind of NIC is used ?

>     4.53    ipt_do_table 0x80365C34 1352
>     3.45    eth_xmit 0x870D0C88 5460
>     3.41    fib_table_lookup 0x8035240C 856    <----------
>     3.38    __netif_receive_skb_core 0x802B5C00 2276
>     3.07    dma_device_write 0x80013BD4 752
>     2.94    nf_iterate 0x802EA380 256
>     2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
>     2.24    ip_forward 0x8031108C 1040
>     2.04    tcp_packet 0x802F45BC 3956
>     1.93    nf_conntrack_in 0x802EEAF4 2284
> 
> 2.    Based on the above observation, when searched,  it seems Routing
> cache code has been removed from Linux-3.6 kernel and thus every
> packet has to go through ip_route_input_noref to find the destination.
> 
> 3.    Related to this, a patch from David Miller adds "ipv4: Early TCP
> socket demux" which caches the "dst per socket" and maintains
> tcp_hashinfo and uses early_demux(skb) (TCP --> tcp_v4_early_demux and
> UDP --> NULL i.e not defined) to get the "dst" of that skb and thus
> avoids ip_route_input_noref being called everytime.
>           -  But this still doesn’t handle routing scenarios (LAN <-->  WAN).
> 
> 4.    A patch for UDP early demux has been added in Linux 3.13 and
> certain bugfixes has gone in Linux-3.14 .
> 
> 5.    As we are based on 3.10 thus no UDP early_demux support . This
> means we have to backport the UDP early demux patch to 3.10 kernel .

Nope : This will be of no use on a router. It even will slow down the
router.

> 
> 
> Issue :
> -----------
> 
> 1.    The implementation of "Early TCP socket demux" doesn't address
> the routing scenario (LAN <---> WAN) . This means TCP and UDP routing
> performance will be less in 3.10 kernel and also in 3.14 kernel as
> every packet has to go through route lookup.
> 
> 
> Is there an alternative to get back the Linux stack performance of 2.6
> or 3.4 kernel where we have the route cache ?
> 
> I guess plain routing scenario was NOT thought through while removing
> the routing cache code.

This is the opposite. Route cache was easily targeted by DDOS attacks.

This was a nightmare.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ