[<prev] [next>] [day] [month] [year] [list]
Message-ID: <e606f6aa-5aba-0d47-8cc1-616cfead0faf@redhat.com>
Date: Thu, 17 Jun 2021 14:01:42 +0800
From: Jason Wang <jasowang@...hat.com>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
"Michael S. Tsirkin" <mst@...hat.com>,
Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
KP Singh <kpsingh@...nel.org>,
virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org,
"dust.li" <dust.li@...ux.alibaba.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v5 14/15] virtio-net: xsk direct xmit inside xsk
wakeup
在 2021/6/17 下午1:55, Xuan Zhuo 写道:
> On Thu, 17 Jun 2021 11:07:17 +0800, Jason Wang <jasowang@...hat.com> wrote:
>> 在 2021/6/10 下午4:22, Xuan Zhuo 写道:
>>> Calling virtqueue_napi_schedule() in wakeup results in napi running on
>>> the current cpu. If the application is not busy, then there is no
>>> problem. But if the application itself is busy, it will cause a lot of
>>> scheduling.
>>>
>>> If the application is continuously sending data packets, due to the
>>> continuous scheduling between the application and napi, the data packet
>>> transmission will not be smooth, and there will be an obvious delay in
>>> the transmission (you can use tcpdump to see it). When pressing a
>>> channel to 100% (vhost reaches 100%), the cpu where the application is
>>> located reaches 100%.
>>>
>>> This patch sends a small amount of data directly in wakeup. The purpose
>>> of this is to trigger the tx interrupt. The tx interrupt will be
>>> awakened on the cpu of its affinity, and then trigger the operation of
>>> the napi mechanism, napi can continue to consume the xsk tx queue. Two
>>> cpus are running, cpu0 is running applications, cpu1 executes
>>> napi consumption data. The same is to press a channel to 100%, but the
>>> utilization rate of cpu0 is 12.7% and the utilization rate of cpu1 is
>>> 2.9%.
>>>
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>>> ---
>>> drivers/net/virtio/xsk.c | 28 +++++++++++++++++++++++-----
>>> 1 file changed, 23 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
>>> index 36cda2dcf8e7..3973c82d1ad2 100644
>>> --- a/drivers/net/virtio/xsk.c
>>> +++ b/drivers/net/virtio/xsk.c
>>> @@ -547,6 +547,7 @@ int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
>>> {
>>> struct virtnet_info *vi = netdev_priv(dev);
>>> struct xsk_buff_pool *pool;
>>> + struct netdev_queue *txq;
>>> struct send_queue *sq;
>>>
>>> if (!netif_running(dev))
>>> @@ -559,11 +560,28 @@ int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
>>>
>>> rcu_read_lock();
>>> pool = rcu_dereference(sq->xsk.pool);
>>> - if (pool) {
>>> - local_bh_disable();
>>> - virtqueue_napi_schedule(&sq->napi, sq->vq);
>>> - local_bh_enable();
>>> - }
>>> + if (!pool)
>>> + goto end;
>>> +
>>> + if (napi_if_scheduled_mark_missed(&sq->napi))
>>> + goto end;
>>> +
>>> + txq = netdev_get_tx_queue(dev, qid);
>>> +
>>> + __netif_tx_lock_bh(txq);
>>> +
>>> + /* Send part of the packet directly to reduce the delay in sending the
>>> + * packet, and this can actively trigger the tx interrupts.
>>> + *
>>> + * If no packet is sent out, the ring of the device is full. In this
>>> + * case, we will still get a tx interrupt response. Then we will deal
>>> + * with the subsequent packet sending work.
>>> + */
>>> + virtnet_xsk_run(sq, pool, sq->napi.weight, false);
>>
>> This looks tricky, and it won't be efficient since there could be some
>> contention on the tx lock.
>>
>> I wonder if we can simulate the interrupt via IPI like what RPS did.
> Let me try.
>
>> In the long run, we may want to extend the spec to support interrupt
>> trigger though driver.
> Can we submit this with reset queue?
We need separate features. And it looks to me it's not as urgent as reset.
Thanks
>
> Thanks.
>
>> Thanks
>>
>>
>>> +
>>> + __netif_tx_unlock_bh(txq);
>>> +
>>> +end:
>>> rcu_read_unlock();
>>> return 0;
>>> }
Powered by blists - more mailing lists