[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1411711496.16953.92.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Thu, 25 Sep 2014 23:04:56 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Tom Herbert <therbert@...gle.com>
Cc: Jesper Dangaard Brouer <brouer@...hat.com>,
Linux Netdev List <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Alexander Duyck <alexander.h.duyck@...el.com>,
Toke Høiland-Jørgensen <toke@...e.dk>,
Florian Westphal <fw@...len.de>,
Jamal Hadi Salim <jhs@...atatu.com>,
Dave Taht <dave.taht@...il.com>,
John Fastabend <john.r.fastabend@...el.com>,
Daniel Borkmann <dborkman@...hat.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: [PATCH net-next] dql: dql_queued() should write first to reduce bus
transactions
From: Eric Dumazet <edumazet@...gle.com>
While doing high throughput test on a BQL enabled NIC,
I found a very high cost in ndo_start_xmit() when accessing BQL data.
It turned out the problem was caused by compiler trying to be
smart, but involving a bad MESI transaction :
0.05 │ mov 0xc0(%rax),%edi // LOAD dql->num_queued
0.48 │ mov %edx,0xc8(%rax) // STORE dql->last_obj_cnt = count
58.23 │ add %edx,%edi
0.58 │ cmp %edi,0xc4(%rax)
0.76 │ mov %edi,0xc0(%rax) // STORE dql->num_queued += count
0.72 │ js bd8
I got an incredible 10 % gain [1] by making sure cpu do not attempt
to get the cache line in Shared mode, but directly requests for
ownership.
New code :
mov %edx,0xc8(%rax) // STORE dql->last_obj_cnt = count
add %edx,0xc0(%rax) // RMW dql->num_queued += count
mov 0xc4(%rax),%ecx // LOAD dql->adj_limit
mov 0xc0(%rax),%edx // LOAD dql->num_queued
cmp %edx,%ecx
The TX completion was running from another cpu, with high interrupts
rate.
Note that I am using barrier() as a soft hint, as mb() here could be
too heavy cost.
[1] This was a netperf TCP_STREAM with TSO disabled, but GSO enabled.
Signed-off-by: Eric Dumazet <edumazet@...gle.com>
---
include/linux/dynamic_queue_limits.h | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5621547d631b..a4be70398ce1 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -73,14 +73,22 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
{
BUG_ON(count > DQL_MAX_OBJECT);
- dql->num_queued += count;
dql->last_obj_cnt = count;
+
+ /* We want to force a write first, so that cpu do not attempt
+ * to get cache line containing last_obj_cnt, num_queued, adj_limit
+ * in Shared state, but directly does a Request For Ownership
+ * It is only a hint, we use barrier() only.
+ */
+ barrier();
+
+ dql->num_queued += count;
}
/* Returns how many objects can be queued, < 0 indicates over limit. */
static inline int dql_avail(const struct dql *dql)
{
- return dql->adj_limit - dql->num_queued;
+ return ACCESS_ONCE(dql->adj_limit) - ACCESS_ONCE(dql->num_queued);
}
/* Record number of completed objects and recalculate the limit. */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists