[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1479751857.8455.419.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Mon, 21 Nov 2016 10:10:57 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Rick Jones <rick.jones2@....com>, netdev@...r.kernel.org,
Saeed Mahameed <saeedm@...lanox.com>,
Tariq Toukan <tariqt@...lanox.com>
Subject: Re: Netperf UDP issue with connected sockets
On Mon, 2016-11-21 at 17:03 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 17 Nov 2016 10:51:23 -0800
> Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> > On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
> >
> > > The point is I can see a socket Send-Q forming, thus we do know the
> > > application have something to send. Thus, and possibility for
> > > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > > socket layer into qdisc layer, should be fairly simple (and rest of
> > > xmit_more is already in place).
> >
> >
> > As I said, you are fooled by TX completions.
>
> Obviously TX completions play a role yes, and I bet I can adjust the
> TX completion to cause xmit_more to happen, at the expense of
> introducing added latency.
>
> The point is the "bloated" spinlock in __dev_queue_xmit is still caused
> by the MMIO tailptr/doorbell. The added cost occurs when enqueueing
> packets, and result in the inability to get enough packets into the
> qdisc for xmit_more going (on my system). I argue that a bulk enqueue
> API would allow us to get past the hurtle of transitioning into
> xmit_more mode more easily.
>
This is very nice, but we already have bulk enqueue, it is called
xmit_more.
Kernel does not know your application is sending a packet after the one
you send.
xmit_more is not often used applications/stacks send many small packets.
qdisc is empty (one enqueued packet is immediately dequeued so
skb->xmit_more is 0), and even bypassed (TCQ_F_CAN_BYPASS)
Not sure it this has been tried before, but the doorbell avoidance could
be done by the driver itself, because it knows a TX completion will come
shortly (well... if softirqs are not delayed too much !)
Doorbell would be forced only if :
( "skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
OR
( too many [1] packets were put in TX ring buffer, no point deferring
more)
Start the pump, but once it is started, let the doorbells being done by
TX completion.
ndo_start_xmit and TX completion handler would have to maintain a shared
state describing if packets were ready but doorbell deferred.
Note that TX completion means "if at least one packet was drained",
otherwise busy polling, constantly calling napi->poll() would force a
doorbell too soon for devices sharing a NAPI for both RX and TX.
But then, maybe busy poll would like to force a doorbell...
I could try these ideas on mlx4 shortly.
[1] limit could be derived from active "ethtool -c" params, eg tx-frames
Powered by blists - more mailing lists