netdev - Re: [PATCH net-next 2/3] virtio-net: batch dim request

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240116194912.GE588419@kernel.org>
Date: Tue, 16 Jan 2024 19:49:12 +0000
From: Simon Horman <horms@...nel.org>
To: Heng Qi <hengqi@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, virtualization@...ts.linux.dev,
	Jason Wang <jasowang@...hat.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>,
	Eric Dumazet <edumazet@...gle.com>,
	"David S. Miller" <davem@...emloft.net>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Subject: Re: [PATCH net-next 2/3] virtio-net: batch dim request

On Tue, Jan 16, 2024 at 09:11:32PM +0800, Heng Qi wrote:
> Currently, when each time the driver attempts to update the coalescing
> parameters for a vq, it needs to kick the device.
> The following path is observed:
>   1. Driver kicks the device;
>   2. After the device receives the kick, CPU scheduling occurs and DMA
>      multiple buffers multiple times;
>   3. The device completes processing and replies with a response.
> 
> When large-queue devices issue multiple requests and kick the device
> frequently, this often interrupt the work of the device-side CPU.
> In addition, each vq request is processed separately, causing more
> delays for the CPU to wait for the DMA request to complete.
> 
> These interruptions and overhead will strain the CPU responsible for
> controlling the path of the DPU, especially in multi-device and
> large-queue scenarios.
> 
> To solve the above problems, we internally tried batch request,
> which merges requests from multiple queues and sends them at once.
> We conservatively tested 8 queue commands and sent them together.
> The DPU processing efficiency can be improved by 8 times, which
> greatly eases the DPU's support for multi-device and multi-queue DIM.
> 
> Suggested-by: Xiaoming Zhao <zxm377917@...baba-inc.com>
> Signed-off-by: Heng Qi <hengqi@...ux.alibaba.com>

...

> @@ -3546,16 +3552,32 @@ static void virtnet_rx_dim_work(struct work_struct *work)
>  		update_moder = net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
>  		if (update_moder.usec != rq->intr_coal.max_usecs ||
>  		    update_moder.pkts != rq->intr_coal.max_packets) {
> -			err = virtnet_send_rx_ctrl_coal_vq_cmd(vi, qnum,
> -							       update_moder.usec,
> -							       update_moder.pkts);
> -			if (err)
> -				pr_debug("%s: Failed to send dim parameters on rxq%d\n",
> -					 dev->name, qnum);
> -			dim->state = DIM_START_MEASURE;
> +			coal->coal_vqs[j].vqn = cpu_to_le16(rxq2vq(i));
> +			coal->coal_vqs[j].coal.max_usecs = cpu_to_le32(update_moder.usec);
> +			coal->coal_vqs[j].coal.max_packets = cpu_to_le32(update_moder.pkts);
> +			rq->intr_coal.max_usecs = update_moder.usec;
> +			rq->intr_coal.max_packets = update_moder.pkts;
> +			j++;
>  		}
>  	}
>  
> +	if (!j)
> +		goto ret;
> +
> +	coal->num_entries = cpu_to_le32(j);
> +	sg_init_one(&sgs, coal, sizeof(struct virtnet_batch_coal) +
> +		    j * sizeof(struct virtio_net_ctrl_coal_vq));
> +	if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_NOTF_COAL,
> +				  VIRTIO_NET_CTRL_NOTF_COAL_VQS_SET,
> +				  &sgs))
> +		dev_warn(&vi->vdev->dev, "Failed to add dim command\n.");
> +
> +	for (i = 0; i < j; i++) {
> +		rq = &vi->rq[(coal->coal_vqs[i].vqn) / 2];

Hi Heng Qi,

The type of .vqn is __le16, but here it is used as an
integer in host byte order. Perhaps this should be (completely untested!):

		rq = &vi->rq[le16_to_cpu(coal->coal_vqs[i].vqn) / 2];

> +		rq->dim.state = DIM_START_MEASURE;
> +	}
> +
> +ret:
>  	rtnl_unlock();
>  }
>