[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1480552059.18162.239.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Wed, 30 Nov 2016 16:27:39 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>,
Tom Herbert <tom@...bertland.com>,
Willem de Bruijn <willemb@...gle.com>
Cc: Rick Jones <rick.jones2@....com>, netdev@...r.kernel.org,
Saeed Mahameed <saeedm@...lanox.com>,
Tariq Toukan <tariqt@...lanox.com>,
Achiad Shochat <achiad@...lanox.com>
Subject: Re: [WIP] net+mlx4: auto doorbell
Another issue I found during my tests last days, is a problem with BQL,
and more generally when driver stops/starts the queue.
When under stress and BQL stops the queue, driver TX completion does a
lot of work, and servicing CPU also takes over further qdisc_run().
The work-flow is :
1) collect up to 64 (or 256 packets for mlx4) packets from TX ring, and
unmap things, queue skbs for freeing.
2) Calls netdev_tx_completed_queue(ring->tx_queue, packets, bytes);
if (test_and_clear_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state))
netif_schedule_queue(dev_queue);
This leaves a very tiny window where other cpus could grab __QDISC_STATE_SCHED
(They absolutely have no chance to grab it)
So we end up with one cpu doing the ndo_start_xmit() and TX completions,
and RX work.
This problem is magnified when XPS is used, if one mono-threaded application deals with
thousands of TCP sockets.
We should use an additional bit (__QDISC_STATE_PLEASE_GRAB_ME) or some way
to allow another cpu to service the qdisc and spare us.
Powered by blists - more mailing lists