lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <953B660C027164448AE903364AC447D2DBE89CEB@MTLDAG01.mtl.com>
Date:	Wed, 26 Mar 2014 07:12:28 +0000
From:	Yevgeny Petrilin <yevgenyp@...lanox.com>
To:	Ben Hutchings <ben@...adent.org.uk>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Amir Vadai <amirv@...lanox.com>,
	"David S. Miller" <davem@...emloft.net>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Pavel Machek <pavel@....cz>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Len Brown <len.brown@...el.com>,
	Yuval Itkin <yuvali@...lanox.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Ido Shamay <idos@...lanox.com>
Subject: RE: [RFC 0/2] pm,net: Introduce QoS requests per CPU

> > > The current pm_qos implementation has a problem. During a short pause in a high
> > > bandwidth traffic, the kernel can lower the c-state to preserve energy.
> > > When the pause ends, and the traffic resumes, the NIC hardware buffers may be
> > > overflowed before the CPU starts to process the traffic due to the CPU wake-up
> > > latency.
> >
> > This is the point I never understood with mlx4
> >
> > RX ring buffers should allow NIC to buffer quite a large amount of
> > incoming frames. But apparently we miss frames, even in a single TCP
> > flow. I really cant understand why, as sender in my case do not have
> > more than 90 packets in flight (cwnd is limited to 90)
> [...]
> 
> The time taken for software to clean the RX ring is only half the story.
> 
> A DMA write requires every CPU's cache controller to invalidate affected
> cache lines.  It may also require reading from cache, if the write
> covers only part of a cache line.  So at least the cache controllers
> need to be woken from sleep, and until then all DMA writes must be
> buffered in some combination of CPUs, bridges and the network
> controller's RX FIFOs.  If those buffers aren't long enough for the
> delay, packets will be dropped.  (Ethernet flow control may help with
> this, if enabled.)
> 
> Back in 2007, colleagues at Solarflare measured DMA write delays of
> about 10 us when CPUs had to be woken up, rising to 40-50 us for one
> buggy Intel model.  This motivated a large increase to RX FIFO size in
> the SFC4000B and subsequent controllers.
> 
This is exactly the story here.
Not all NICs have HW buffers which are big enough. And for those who don't,
changing c-states would cause a packet drop with burst traffic,
even if the number of packets is lower than the ring size.
Indeed, flow control could prevent those drops, but in many cases it is not enabled.

Yevgeny 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ