[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-KKhNLFY=qMkioYOBWnM8KC_+Fdrjd77AbCWaiD8uEtVA@mail.gmail.com>
Date: Fri, 21 Apr 2017 14:10:06 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Network Development <netdev@...r.kernel.org>,
"Michael S. Tsirkin" <mst@...hat.com>,
virtualization@...ts.linux-foundation.org,
David Miller <davem@...emloft.net>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
On Thu, Apr 20, 2017 at 12:02 PM, Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>>> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>>> {
>>> struct virtio_net_hdr_mrg_rxbuf *hdr;
>>> @@ -1130,9 +1172,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb,
>>> struct net_device *dev)
>>> int err;
>>> struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>>> bool kick = !skb->xmit_more;
>>> + bool use_napi = sq->napi.weight;
>>> /* Free up any pending old buffers before queueing new ones. */
>>> - free_old_xmit_skbs(sq);
>>> + if (!use_napi)
>>> + free_old_xmit_skbs(sq);
>>
>>
>> I'm not sure this is best or even correct. Consider we clean xmit packets
>> speculatively in virtnet_poll_tx(), we need call free_old_xmit_skbs()
>> unconditionally. This can also help to reduce the possible of napi
>> rescheduling in virtnet_poll_tx().
>
> Because of the use of trylock there. Absolutely, thanks! Perhaps I should
> only use trylock in the opportunistic clean path from the rx softirq and
> full locking in the tx softirq.
>
> I previously observed that cleaning here would, counterintuitively,
> reduce efficiency. It reverted the improvements of cleaning transmit
> completions from the receive softirq. Going through my data, I did
> not observe this regression anymore on the latest patchset.
>
> Let me test again, with and without the new
> virtqueue_enable_cb_delayed patch. Perhaps that made a
> difference.
Neither cleaning in start_xmit nor converting the napi tx trylock to lock
shows a significant impact on loadtests, whether cpu affine or not.
I'll make both changes, as the first reduces patch size and code complexity
and the second is a more obviously correct codepath than than with trylock.
To be clear, the variant called from the rx napi handler will still
opportunistically use trylock.
Powered by blists - more mailing lists