netdev - Re: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs with TCQ_F

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 25 Sep 2014 16:40:09 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Tom Herbert <therbert@...gle.com>
Cc:	Jesper Dangaard Brouer <brouer@...hat.com>,
	Linux Netdev List <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Alexander Duyck <alexander.h.duyck@...el.com>,
	Toke Høiland-Jørgensen <toke@...e.dk>,
	Florian Westphal <fw@...len.de>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Dave Taht <dave.taht@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Daniel Borkmann <dborkman@...hat.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: Re: [net-next PATCH 1/1 V4] qdisc: bulk dequeue support for qdiscs
 with TCQ_F_ONETXQUEUE

On Wed, 2014-09-24 at 19:12 -0700, Eric Dumazet wrote:
> On Wed, 2014-09-24 at 12:22 -0700, Eric Dumazet wrote:
> > On Wed, 2014-09-24 at 11:34 -0700, Tom Herbert wrote:
> > > >
> > > I believe drivers typically use skb->len for BQL tracking. Since
> > > bytelimit is based on BQL here, it might be more correct to use
> > > skb->len.
> 
> Speaking of BQL, I wonder if we now should try to not wakeup queues as
> soon some room was made, and instead have a 50% threshold ?
> 
> This would probably increase probability to have bulk dequeues ;)

It turned out the problem I noticed was caused by compiler trying to be
smart, but involving a bad MESI transaction.

  0.05 │  mov    0xc0(%rax),%edi    // LOAD dql->num_queued
  0.48 │  mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
 58.23 │  add    %edx,%edi
  0.58 │  cmp    %edi,0xc4(%rax)
  0.76 │  mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
  0.72 │  js     bd8


I get an incredible 10 % gain by making sure cpu wont get the cache line
in Shared mode.

(I also tried a barrier() in netdev_tx_sent_queue() between the

	dql_queued(&dev_queue->dql, bytes);
-
+	barrier();
 	if (likely(dql_avail(&dev_queue->dql) >= 0))

But following patch seems cleaner

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5621547d631b..978fbe332090 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -80,7 +80,7 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
 /* Returns how many objects can be queued, < 0 indicates over limit. */
 static inline int dql_avail(const struct dql *dql)
 {
-	return dql->adj_limit - dql->num_queued;
+	return ACCESS_ONCE(dql->adj_limit) - ACCESS_ONCE(dql->num_queued);
 }
 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html