lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 26 Sep 2014 09:23:57 +0000
From:	David Laight <David.Laight@...LAB.COM>
To:	'Eric Dumazet' <eric.dumazet@...il.com>,
	Tom Herbert <therbert@...gle.com>
CC:	Jesper Dangaard Brouer <brouer@...hat.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	"Alexander Duyck" <alexander.h.duyck@...el.com>,
	Toke Høiland-Jørgensen <toke@...e.dk>,
	Florian Westphal <fw@...len.de>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Dave Taht <dave.taht@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	"Daniel Borkmann" <dborkman@...hat.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: RE: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs
 with TCQ_F_ONETXQUEUE

From: Eric Dumazet
> On Wed, 2014-09-24 at 19:12 -0700, Eric Dumazet wrote:
...
> It turned out the problem I noticed was caused by compiler trying to be
> smart, but involving a bad MESI transaction.
> 
>   0.05   mov    0xc0(%rax),%edi    // LOAD dql->num_queued
>   0.48   mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
>  58.23   add    %edx,%edi
>   0.58   cmp    %edi,0xc4(%rax)
>   0.76   mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
>   0.72   js     bd8
> 
> 
> I get an incredible 10 % gain by making sure cpu wont get the cache line
> in Shared mode.

That is a stunning difference between requesting 'exclusive' access
and upgrading 'shared' to exclusive.
Stinks of a cpu bug?

Or is the reported stall a side effect of waiting for the earlier
'cache line read' to complete in order to issue the 'upgrade to exclusive'.
In which case gcc's instruction scheduler probably needs to be taught
to schedule writes before reads.

> (I also tried a barrier() in netdev_tx_sent_queue() between the
> 
> 	dql_queued(&dev_queue->dql, bytes);
> -
> +	barrier();
>  	if (likely(dql_avail(&dev_queue->dql) >= 0))
> 
> But following patch seems cleaner
> 
> diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
> index 5621547d631b..978fbe332090 100644
> --- a/include/linux/dynamic_queue_limits.h
> +++ b/include/linux/dynamic_queue_limits.h
> @@ -80,7 +80,7 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
>  /* Returns how many objects can be queued, < 0 indicates over limit. */
>  static inline int dql_avail(const struct dql *dql)
>  {
> -	return dql->adj_limit - dql->num_queued;
> +	return ACCESS_ONCE(dql->adj_limit) - ACCESS_ONCE(dql->num_queued);

Dunno, that could have an impact on other calls where the values
are already in registers - I suspect ACCESS_ONCE() forces an access.

	David

Powered by blists - more mailing lists