netdev - Re: [PATCH net-next v5 14/15] virtio-net: xsk direct xmit inside xsk wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <e606f6aa-5aba-0d47-8cc1-616cfead0faf@redhat.com>
Date:   Thu, 17 Jun 2021 14:01:42 +0800
From:   Jason Wang <jasowang@...hat.com>
To:     Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        "Michael S. Tsirkin" <mst@...hat.com>,
        Björn Töpel <bjorn@...nel.org>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Jonathan Lemon <jonathan.lemon@...il.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        John Fastabend <john.fastabend@...il.com>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        KP Singh <kpsingh@...nel.org>,
        virtualization@...ts.linux-foundation.org, bpf@...r.kernel.org,
        "dust.li" <dust.li@...ux.alibaba.com>, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v5 14/15] virtio-net: xsk direct xmit inside xsk
 wakeup


在 2021/6/17 下午1:55, Xuan Zhuo 写道:
> On Thu, 17 Jun 2021 11:07:17 +0800, Jason Wang <jasowang@...hat.com> wrote:
>> 在 2021/6/10 下午4:22, Xuan Zhuo 写道:
>>> Calling virtqueue_napi_schedule() in wakeup results in napi running on
>>> the current cpu. If the application is not busy, then there is no
>>> problem. But if the application itself is busy, it will cause a lot of
>>> scheduling.
>>>
>>> If the application is continuously sending data packets, due to the
>>> continuous scheduling between the application and napi, the data packet
>>> transmission will not be smooth, and there will be an obvious delay in
>>> the transmission (you can use tcpdump to see it). When pressing a
>>> channel to 100% (vhost reaches 100%), the cpu where the application is
>>> located reaches 100%.
>>>
>>> This patch sends a small amount of data directly in wakeup. The purpose
>>> of this is to trigger the tx interrupt. The tx interrupt will be
>>> awakened on the cpu of its affinity, and then trigger the operation of
>>> the napi mechanism, napi can continue to consume the xsk tx queue. Two
>>> cpus are running, cpu0 is running applications, cpu1 executes
>>> napi consumption data. The same is to press a channel to 100%, but the
>>> utilization rate of cpu0 is 12.7% and the utilization rate of cpu1 is
>>> 2.9%.
>>>
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
>>> ---
>>>    drivers/net/virtio/xsk.c | 28 +++++++++++++++++++++++-----
>>>    1 file changed, 23 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
>>> index 36cda2dcf8e7..3973c82d1ad2 100644
>>> --- a/drivers/net/virtio/xsk.c
>>> +++ b/drivers/net/virtio/xsk.c
>>> @@ -547,6 +547,7 @@ int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
>>>    {
>>>    	struct virtnet_info *vi = netdev_priv(dev);
>>>    	struct xsk_buff_pool *pool;
>>> +	struct netdev_queue *txq;
>>>    	struct send_queue *sq;
>>>
>>>    	if (!netif_running(dev))
>>> @@ -559,11 +560,28 @@ int virtnet_xsk_wakeup(struct net_device *dev, u32 qid, u32 flag)
>>>
>>>    	rcu_read_lock();
>>>    	pool = rcu_dereference(sq->xsk.pool);
>>> -	if (pool) {
>>> -		local_bh_disable();
>>> -		virtqueue_napi_schedule(&sq->napi, sq->vq);
>>> -		local_bh_enable();
>>> -	}
>>> +	if (!pool)
>>> +		goto end;
>>> +
>>> +	if (napi_if_scheduled_mark_missed(&sq->napi))
>>> +		goto end;
>>> +
>>> +	txq = netdev_get_tx_queue(dev, qid);
>>> +
>>> +	__netif_tx_lock_bh(txq);
>>> +
>>> +	/* Send part of the packet directly to reduce the delay in sending the
>>> +	 * packet, and this can actively trigger the tx interrupts.
>>> +	 *
>>> +	 * If no packet is sent out, the ring of the device is full. In this
>>> +	 * case, we will still get a tx interrupt response. Then we will deal
>>> +	 * with the subsequent packet sending work.
>>> +	 */
>>> +	virtnet_xsk_run(sq, pool, sq->napi.weight, false);
>>
>> This looks tricky, and it won't be efficient since there could be some
>> contention on the tx lock.
>>
>> I wonder if we can simulate the interrupt via IPI like what RPS did.
> Let me try.
>
>> In the long run, we may want to extend the spec to support interrupt
>> trigger though driver.
> Can we submit this with reset queue?


We need separate features. And it looks to me it's not as urgent as reset.

Thanks


>
> Thanks.
>
>> Thanks
>>
>>
>>> +
>>> +	__netif_tx_unlock_bh(txq);
>>> +
>>> +end:
>>>    	rcu_read_unlock();
>>>    	return 0;
>>>    }