netdev - Re: [PATCH net-next v5] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8b853379-1601-4387-adaf-31f786f306ca@redhat.com>
Date: Tue, 14 Oct 2025 13:25:47 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, netdev@...r.kernel.org
Cc: Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Wen Gu <guwen@...ux.alibaba.com>,
 Philo Lu <lulie@...ux.alibaba.com>, Lorenzo Bianconi <lorenzo@...nel.org>,
 Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
 Lukas Bulwahn <lukas.bulwahn@...hat.com>,
 Geert Uytterhoeven <geert+renesas@...der.be>,
 Vivian Wang <wangruikang@...as.ac.cn>,
 Troy Mitchell <troy.mitchell@...ux.spacemit.com>,
 Dust Li <dust.li@...ux.alibaba.com>, alok.a.tiwari@...cle.com,
 kalesh-anakkur.purayil@...adcom.com
Subject: Re: [PATCH net-next v5] eea: Add basic driver framework for Alibaba
 Elastic Ethernet Adaptor

On 10/13/25 4:18 AM, Xuan Zhuo wrote:
> Add a driver framework for EEA that will be available in the future.
> 
> This driver is currently quite minimal, implementing only fundamental
> core functionalities. Key features include: I/O queue management via
> adminq, basic PCI-layer operations, and essential RX/TX data
> communication capabilities. It also supports the creation,
> initialization, and management of network devices (netdev). Furthermore,
> the ring structures for both I/O queues and adminq have been abstracted
> into a simple, unified, and reusable library implementation,
> facilitating future extension and maintenance.
> 
> Reviewed-by: Dust Li <dust.li@...ux.alibaba.com>
> Reviewed-by: Philo Lu <lulie@...ux.alibaba.com>
> Signed-off-by: Wen Gu <guwen@...ux.alibaba.com>
> Signed-off-by: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
> ---
> 
> v5: Thanks for the comments from Kalesh Anakkur Purayil, ALOK TIWARI
> v4: Thanks for the comments from Troy Mitchell, Przemek Kitszel, Andrew Lunn, Kalesh Anakkur Purayil
> v3: Thanks for the comments from Paolo Abenchi
> v2: Thanks for the comments from Simon Horman and Andrew Lunn
> v1: Thanks for the comments from Simon Horman and Andrew Lunn

You should add a synopsis of the major changes vs the previous revision
to make reiviewer's work easier.

> 
> This commit is indeed quite large, but further splitting it would not be
> meaningful. Historically, many similar drivers have been introduced with
> commits of similar size and scope, so we chose not to invest excessive
> effort into finer-grained splitting.

That also means that you require the reviewers to invest a lot of extra
effort here, which in turn does not help making progresses.

[...]> +/* resources: ring, buffers, irq */
> +int eea_reset_hw_resources(struct eea_net *enet, struct eea_net_tmp *tmp)
> +{
> +	struct eea_net_tmp _tmp = {};
> +	int err;
> +
> +	if (!tmp) {
> +		enet_mk_tmp_cfg(enet, &_tmp);
> +		tmp = &_tmp;
This is quite ugly. Let the caller always pass non zero 'tmp'. Also a
more describing name would help.

> +	}
> +
> +	if (!netif_running(enet->netdev)) {
> +		enet->cfg = tmp->cfg;
> +		return 0;
> +	}
> +
> +	err = eea_alloc_rxtx_q_mem(tmp);
> +	if (err) {
> +		netdev_warn(enet->netdev,
> +			    "eea reset: alloc q failed. stop reset. err %d\n",
> +			    err);
> +		return err;
> +	}
> +
> +	eea_netdev_stop(enet->netdev);
> +
> +	enet_bind_new_q_and_cfg(enet, tmp);
> +
> +	err = eea_active_ring_and_irq(enet);
> +	if (err) {
> +		netdev_warn(enet->netdev,
> +			    "eea reset: active new ring and irq failed. err %d\n",
> +			    err);
> +		return err;
> +	}
> +
> +	err = eea_start_rxtx(enet->netdev);
> +	if (err)
> +		netdev_warn(enet->netdev,
> +			    "eea reset: start queue failed. err %d\n", err);

I'm unsure why you ignore my feedback on v2 WRT errors generated here?

> +
> +	return err;
> +}
> +
> +int eea_queues_check_and_reset(struct eea_device *edev)
> +{
> +	struct eea_aq_dev_status *dstatus __free(kfree) = NULL;
> +	struct eea_aq_queue_status *qstatus;
> +	struct eea_aq_queue_status *qs;
> +	bool need_reset = false;
> +	int num, i, err = 0;
> +
> +	num = edev->enet->cfg.tx_ring_num * 2 + 1;

The above should probably moved under the RTNL lock or you could access
stale values.

> +
> +	rtnl_lock();
> +
> +	dstatus = eea_adminq_dev_status(edev->enet);
> +	if (!dstatus) {
> +		netdev_warn(edev->enet->netdev, "query queue status failed.\n");


		err = -ENOMEM;
		goto out_unlock;


> +		rtnl_unlock();
> +		return -ENOMEM;
> +	}
> +
> +	if (le16_to_cpu(dstatus->link_status) == EEA_LINK_DOWN_STATUS) {
> +		eea_netdev_stop(edev->enet->netdev);
> +		edev->enet->link_err = EEA_LINK_ERR_LINK_DOWN;
> +		netdev_warn(edev->enet->netdev, "device link is down. stop device.\n");
> +		rtnl_unlock();
> +		return 0;

		goto out_unlock;

> +	}
> +
> +	qstatus = dstatus->q_status;
> +
> +	for (i = 0; i < num; ++i) {
> +		qs = &qstatus[i];
> +
> +		if (le16_to_cpu(qs->status) == EEA_QUEUE_STATUS_NEED_RESET) {
> +			netdev_warn(edev->enet->netdev,
> +				    "queue status: queue %u needs to reset\n",
> +				    le16_to_cpu(qs->qidx));
> +			need_reset = true;
> +		}
> +	}
> +
> +	if (need_reset)
> +		err = eea_reset_hw_resources(edev->enet, NULL);
> +

out_unlock:> +	rtnl_unlock();
> +	return err;


[...]> +/* ha handle code */
> +static void eea_ha_handle_work(struct work_struct *work)
> +{
> +	struct eea_pci_device *ep_dev;
> +	struct eea_device *edev;
> +	struct pci_dev *pci_dev;
> +	u16 reset;
> +
> +	ep_dev = container_of(work, struct eea_pci_device, ha_handle_work);
> +	edev = &ep_dev->edev;
> +
> +	/* Ha interrupt is triggered, so there maybe some error, we may need to
> +	 * reset the device or reset some queues.
> +	 */
> +	dev_warn(&ep_dev->pci_dev->dev, "recv ha interrupt.\n");
> +
> +	if (ep_dev->reset_pos) {
> +		pci_read_config_word(ep_dev->pci_dev, ep_dev->reset_pos,
> +				     &reset);
> +		/* clear bit */
> +		pci_write_config_word(ep_dev->pci_dev, ep_dev->reset_pos,
> +				      0xFFFF);
> +
> +		if (reset & EEA_PCI_CAP_RESET_FLAG) {
> +			dev_warn(&ep_dev->pci_dev->dev,
> +				 "recv device reset request.\n");
> +
> +			pci_dev = ep_dev->pci_dev;
> +
> +			/* The pci remove callback may hold this lock. If the
> +			 * pci remove callback is called, then we can ignore the
> +			 * ha interrupt.
> +			 */
> +			if (mutex_trylock(&edev->ha_lock)) {
> +				edev->ha_reset = true;
> +
> +				__eea_pci_remove(pci_dev, false);
> +				__eea_pci_probe(pci_dev, ep_dev);
> +
> +				edev->ha_reset = false;
> +				mutex_unlock(&edev->ha_lock);
> +			} else {
> +				dev_warn(&ep_dev->pci_dev->dev,
> +					 "ha device reset: trylock failed.\n");

Nesting here is quite high, possibly move the above in a separate helper.

> +			}
> +			return;
> +		}
> +	}
> +
> +	eea_queues_check_and_reset(&ep_dev->edev);
> +}


I'm sorry, EPATCHISTOOBIG I can't complete the review even in with an
unreasonable amount of time.

/P