[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D1749E380@AcuExch.aculab.com>
Date: Fri, 26 Sep 2014 09:23:57 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Eric Dumazet' <eric.dumazet@...il.com>,
Tom Herbert <therbert@...gle.com>
CC: Jesper Dangaard Brouer <brouer@...hat.com>,
Linux Netdev List <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
"Alexander Duyck" <alexander.h.duyck@...el.com>,
Toke Høiland-Jørgensen <toke@...e.dk>,
Florian Westphal <fw@...len.de>,
Jamal Hadi Salim <jhs@...atatu.com>,
Dave Taht <dave.taht@...il.com>,
John Fastabend <john.r.fastabend@...el.com>,
"Daniel Borkmann" <dborkman@...hat.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: RE: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs
with TCQ_F_ONETXQUEUE
From: Eric Dumazet
> On Wed, 2014-09-24 at 19:12 -0700, Eric Dumazet wrote:
...
> It turned out the problem I noticed was caused by compiler trying to be
> smart, but involving a bad MESI transaction.
>
> 0.05 mov 0xc0(%rax),%edi // LOAD dql->num_queued
> 0.48 mov %edx,0xc8(%rax) // STORE dql->last_obj_cnt = count
> 58.23 add %edx,%edi
> 0.58 cmp %edi,0xc4(%rax)
> 0.76 mov %edi,0xc0(%rax) // STORE dql->num_queued += count
> 0.72 js bd8
>
>
> I get an incredible 10 % gain by making sure cpu wont get the cache line
> in Shared mode.
That is a stunning difference between requesting 'exclusive' access
and upgrading 'shared' to exclusive.
Stinks of a cpu bug?
Or is the reported stall a side effect of waiting for the earlier
'cache line read' to complete in order to issue the 'upgrade to exclusive'.
In which case gcc's instruction scheduler probably needs to be taught
to schedule writes before reads.
> (I also tried a barrier() in netdev_tx_sent_queue() between the
>
> dql_queued(&dev_queue->dql, bytes);
> -
> + barrier();
> if (likely(dql_avail(&dev_queue->dql) >= 0))
>
> But following patch seems cleaner
>
> diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
> index 5621547d631b..978fbe332090 100644
> --- a/include/linux/dynamic_queue_limits.h
> +++ b/include/linux/dynamic_queue_limits.h
> @@ -80,7 +80,7 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
> /* Returns how many objects can be queued, < 0 indicates over limit. */
> static inline int dql_avail(const struct dql *dql)
> {
> - return dql->adj_limit - dql->num_queued;
> + return ACCESS_ONCE(dql->adj_limit) - ACCESS_ONCE(dql->num_queued);
Dunno, that could have an impact on other calls where the values
are already in registers - I suspect ACCESS_ONCE() forces an access.
David
Powered by blists - more mailing lists