lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D1749E380@AcuExch.aculab.com> Date: Fri, 26 Sep 2014 09:23:57 +0000 From: David Laight <David.Laight@...LAB.COM> To: 'Eric Dumazet' <eric.dumazet@...il.com>, Tom Herbert <therbert@...gle.com> CC: Jesper Dangaard Brouer <brouer@...hat.com>, Linux Netdev List <netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>, "Alexander Duyck" <alexander.h.duyck@...el.com>, Toke Høiland-Jørgensen <toke@...e.dk>, Florian Westphal <fw@...len.de>, Jamal Hadi Salim <jhs@...atatu.com>, Dave Taht <dave.taht@...il.com>, John Fastabend <john.r.fastabend@...el.com>, "Daniel Borkmann" <dborkman@...hat.com>, Hannes Frederic Sowa <hannes@...essinduktion.org> Subject: RE: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE From: Eric Dumazet > On Wed, 2014-09-24 at 19:12 -0700, Eric Dumazet wrote: ... > It turned out the problem I noticed was caused by compiler trying to be > smart, but involving a bad MESI transaction. > > 0.05 mov 0xc0(%rax),%edi // LOAD dql->num_queued > 0.48 mov %edx,0xc8(%rax) // STORE dql->last_obj_cnt = count > 58.23 add %edx,%edi > 0.58 cmp %edi,0xc4(%rax) > 0.76 mov %edi,0xc0(%rax) // STORE dql->num_queued += count > 0.72 js bd8 > > > I get an incredible 10 % gain by making sure cpu wont get the cache line > in Shared mode. That is a stunning difference between requesting 'exclusive' access and upgrading 'shared' to exclusive. Stinks of a cpu bug? Or is the reported stall a side effect of waiting for the earlier 'cache line read' to complete in order to issue the 'upgrade to exclusive'. In which case gcc's instruction scheduler probably needs to be taught to schedule writes before reads. > (I also tried a barrier() in netdev_tx_sent_queue() between the > > dql_queued(&dev_queue->dql, bytes); > - > + barrier(); > if (likely(dql_avail(&dev_queue->dql) >= 0)) > > But following patch seems cleaner > > diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h > index 5621547d631b..978fbe332090 100644 > --- a/include/linux/dynamic_queue_limits.h > +++ b/include/linux/dynamic_queue_limits.h > @@ -80,7 +80,7 @@ static inline void dql_queued(struct dql *dql, unsigned int count) > /* Returns how many objects can be queued, < 0 indicates over limit. */ > static inline int dql_avail(const struct dql *dql) > { > - return dql->adj_limit - dql->num_queued; > + return ACCESS_ONCE(dql->adj_limit) - ACCESS_ONCE(dql->num_queued); Dunno, that could have an impact on other calls where the values are already in registers - I suspect ACCESS_ONCE() forces an access. David
Powered by blists - more mailing lists