lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 26 Sep 2014 13:16:00 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	David Laight <David.Laight@...LAB.COM>,
	'Eric Dumazet' <eric.dumazet@...il.com>,
	Tom Herbert <therbert@...gle.com>
CC:	Jesper Dangaard Brouer <brouer@...hat.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	"Alexander Duyck" <alexander.h.duyck@...el.com>,
	Toke Høiland-Jørgensen <toke@...e.dk>,
	Florian Westphal <fw@...len.de>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Dave Taht <dave.taht@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	"Daniel Borkmann" <dborkman@...hat.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: RE: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs
 with TCQ_F_ONETXQUEUE

From: David Laight
> From: Eric Dumazet
> > On Wed, 2014-09-24 at 19:12 -0700, Eric Dumazet wrote:
> ...
> > It turned out the problem I noticed was caused by compiler trying to be
> > smart, but involving a bad MESI transaction.
> >
> >   0.05   mov    0xc0(%rax),%edi    // LOAD dql->num_queued
> >   0.48   mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
> >  58.23   add    %edx,%edi
> >   0.58   cmp    %edi,0xc4(%rax)
> >   0.76   mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
> >   0.72   js     bd8
> >
> >
> > I get an incredible 10 % gain by making sure cpu wont get the cache line
> > in Shared mode.
> 
> That is a stunning difference between requesting 'exclusive' access
> and upgrading 'shared' to exclusive.
> Stinks of a cpu bug?
> 
> Or is the reported stall a side effect of waiting for the earlier
> 'cache line read' to complete in order to issue the 'upgrade to exclusive'.
> In which case gcc's instruction scheduler probably needs to be taught
> to schedule writes before reads.

Thinking further.
gcc is probably moving memory reads before writes under the assumption that
the cpu might stall waiting for the read to complete but that the write
can be buffered by the hardware.

That assumption is true for simple cpus (like the Nios2), but for x86 with
its multiple instructions in flight (etc) may make little difference if
the memory is in the cache.

OTOH it looks as though there is a big hit if you read then write
a non-present cache line.
(This may even depend on which instructions get executed in parallel,
minor changes to the code could easily change that.)

I wonder how easy it would be to modify gcc to remove (or even reverse)
that memory ordering 'optimisation'.

	David


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ