lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 25 Mar 2014 22:47:29 +0000
From:	Ben Hutchings <ben@...adent.org.uk>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Amir Vadai <amirv@...lanox.com>,
	"David S. Miller" <davem@...emloft.net>, linux-pm@...r.kernel.org,
	netdev@...r.kernel.org, Pavel Machek <pavel@....cz>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Len Brown <len.brown@...el.com>, yuvali@...lanox.com,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Yevgeny Petrilin <yevgenyp@...lanox.com>, idos@...lanox.com
Subject: Re: [RFC 0/2] pm,net: Introduce QoS requests per CPU

On Tue, 2014-03-25 at 08:14 -0700, Eric Dumazet wrote:
> On Tue, 2014-03-25 at 15:18 +0200, Amir Vadai wrote:
> 
> > The current pm_qos implementation has a problem. During a short pause in a high
> > bandwidth traffic, the kernel can lower the c-state to preserve energy.
> > When the pause ends, and the traffic resumes, the NIC hardware buffers may be
> > overflowed before the CPU starts to process the traffic due to the CPU wake-up
> > latency.
> 
> This is the point I never understood with mlx4
> 
> RX ring buffers should allow NIC to buffer quite a large amount of
> incoming frames. But apparently we miss frames, even in a single TCP
> flow. I really cant understand why, as sender in my case do not have
> more than 90 packets in flight (cwnd is limited to 90)
[...]

The time taken for software to clean the RX ring is only half the story.

A DMA write requires every CPU's cache controller to invalidate affected
cache lines.  It may also require reading from cache, if the write
covers only part of a cache line.  So at least the cache controllers
need to be woken from sleep, and until then all DMA writes must be
buffered in some combination of CPUs, bridges and the network
controller's RX FIFOs.  If those buffers aren't long enough for the
delay, packets will be dropped.  (Ethernet flow control may help with
this, if enabled.)

Back in 2007, colleagues at Solarflare measured DMA write delays of
about 10 us when CPUs had to be woken up, rising to 40-50 us for one
buggy Intel model.  This motivated a large increase to RX FIFO size in
the SFC4000B and subsequent controllers.

Ben.

-- 
Ben Hutchings
Make three consecutive correct guesses and you will be considered an expert.

Download attachment "signature.asc" of type "application/pgp-signature" (812 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ