lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080615234620.GC2835@solarflare.com>
Date:	Mon, 16 Jun 2008 00:46:22 +0100
From:	Ben Hutchings <bhutchings@...arflare.com>
To:	Denys Fedoryshchenko <denys@...p.net.lb>
Cc:	netdev@...r.kernel.org
Subject: Re: NAPI, rx_no_buffer_count, e1000, r8169 and other actors

Denys Fedoryshchenko wrote:
> Hi
> 
> Since i am using PC routers for my network, and i reach significant numbers
> (for me significant) i start noticing minor problems. So all this talk about
> networking performance in my case.
> 
> For example.
> Sun server, AMD based (two CPU -  AMD Opteron(tm) Processor 248).
> e1000 connected over PCI-X ([    4.919249] e1000: 0000:01:01.0: e1000_probe:
> (PCI-X:100MHz:64-bit) 00:14:4f:20:89:f4)
> 
> All traffic processed over eth0, 5 VLAN, 1 second average around 110-200Mbps

Currently TX checksum offload does not work for VLAN devices, which may
be a serious performance hit if there is a lot of traffic routed between
VLANs.  This should change in 2.6.27 for some drivers, which I think will
include e1000.

> of traffic. Host running also conntrack (max 1000000 entries, when packetloss
> happen - around 256k entries). Around 1300 routes (FIB_TRIE) running. What is
> worrying me, that ok, i win time by increasing rx descriptors from 256 to
> 4096, but how much time i win? if it "cracks" on 100 Mbps RX, it means by
> interpolating descriptors increase from 256 to 4096 (4 times), i cannot
> process more than 400Mbps RX?

Increasing the RX descriptor ring size should give the driver and stack
more time to catch up after handling some packets that take unusually
long.  It may also allow you to increase interrupt moderation, which
will reduce the per-packet cost.

> The CPU is not so busy after all... maybe there is a way to change some
> parameter to force NAPI poll interface more often?

NAPI polling is not time-based, except indirectly though interrupt
moderation.

> I tried nice, changing realtime priority to FIFO, changing kernel to
> preemptible... no luck, except increasing descriptors.
> 
> Router-Dora ~ # mpstat -P ALL 1
> Linux 2.6.26-rc6-git2-build-0029 (Router-Dora)  06/15/08
> 
> 22:51:02     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  
> %idle    intr/s
> 22:51:03     all    1.00    0.00    0.00    0.00    2.50   29.00    0.00  
> 67.50  12927.00
> 22:51:03       0    2.00    0.00    0.00    0.00    4.00   59.00    0.00  
> 35.00  11935.00
> 22:51:03       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
> 100.00    993.00
> 22:51:03       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00   
> 0.00      0.00
 
You might do better with a NIC that supports MSI-X.  This allows the use of
two RX queues with their own IRQs, each handled by a different processor.
As it is, one CPU is completely idle.  However, I don't know how well the
other work of routing scales to multiple processors.

[...]
> I have another host running, Core 2 Duo, e1000e+3 x e100, also conntrack, same
> kernel configuration and similar amount of traffic, higher load (ifb + plenty
> of shapers running) - almost no errors on default settings.
> Linux 2.6.26-rc6-git2-build-0029 (Kup)  06/16/08
> 
> 07:00:27     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal  
> %idle    intr/s
> 07:00:28     all    0.00    0.00    0.50    0.00    4.00   31.50    0.00  
> 64.00  32835.00
> 07:00:29     all    0.00    0.00    0.50    0.00    2.50   29.00    0.00  
> 68.00  33164.36
> 
> Third host r8169 (PCI! This is important, seems i am running out of PCI
> capacity),

Gigabit Ethernet on plain old PCI is not ideal.  If each card has a
separate route to the south bridge then you might be able to get a fair
fraction of a gigabit between them though.

> 400Mbit/s rx+tx summary load, e1000e interface also - around
> 200Mbps load. What is worrying me - interrupts rate, it seems generated by
> realtek card... is there any way to drop it down? 
[...]

ethtool -C lets you change interrupt moderation.  I don't know anything
about this driver or NIC's capabilities but it does seem to be in the
cheapest GbE cards so I wouldn't expect outstanding performance.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ