netdev - RE: [RFC 0/2] pm,net: Introduce QoS requests per CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 26 Mar 2014 07:12:28 +0000
From:	Yevgeny Petrilin <yevgenyp@...lanox.com>
To:	Ben Hutchings <ben@...adent.org.uk>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Amir Vadai <amirv@...lanox.com>,
	"David S. Miller" <davem@...emloft.net>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Pavel Machek <pavel@....cz>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Len Brown <len.brown@...el.com>,
	Yuval Itkin <yuvali@...lanox.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Ido Shamay <idos@...lanox.com>
Subject: RE: [RFC 0/2] pm,net: Introduce QoS requests per CPU

> > > The current pm_qos implementation has a problem. During a short pause in a high
> > > bandwidth traffic, the kernel can lower the c-state to preserve energy.
> > > When the pause ends, and the traffic resumes, the NIC hardware buffers may be
> > > overflowed before the CPU starts to process the traffic due to the CPU wake-up
> > > latency.
> >
> > This is the point I never understood with mlx4
> >
> > RX ring buffers should allow NIC to buffer quite a large amount of
> > incoming frames. But apparently we miss frames, even in a single TCP
> > flow. I really cant understand why, as sender in my case do not have
> > more than 90 packets in flight (cwnd is limited to 90)
> [...]
> 
> The time taken for software to clean the RX ring is only half the story.
> 
> A DMA write requires every CPU's cache controller to invalidate affected
> cache lines.  It may also require reading from cache, if the write
> covers only part of a cache line.  So at least the cache controllers
> need to be woken from sleep, and until then all DMA writes must be
> buffered in some combination of CPUs, bridges and the network
> controller's RX FIFOs.  If those buffers aren't long enough for the
> delay, packets will be dropped.  (Ethernet flow control may help with
> this, if enabled.)
> 
> Back in 2007, colleagues at Solarflare measured DMA write delays of
> about 10 us when CPUs had to be woken up, rising to 40-50 us for one
> buggy Intel model.  This motivated a large increase to RX FIFO size in
> the SFC4000B and subsequent controllers.
> 
This is exactly the story here.
Not all NICs have HW buffers which are big enough. And for those who don't,
changing c-states would cause a packet drop with burst traffic,
even if the number of packets is lower than the ring size.
Indeed, flow control could prevent those drops, but in many cases it is not enabled.

Yevgeny