netdev - Re: [PATCH net-next v1] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1753414106.2053263-1-xuanzhuo@linux.alibaba.com>
Date: Fri, 25 Jul 2025 11:28:26 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: netdev@...r.kernel.org,
 Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>,
 Wen Gu <guwen@...ux.alibaba.com>,
 Philo Lu <lulie@...ux.alibaba.com>,
 Lorenzo Bianconi <lorenzo@...nel.org>,
 Lukas Bulwahn <lukas.bulwahn@...hat.com>,
 Parthiban Veerasooran <Parthiban.Veerasooran@...rochip.com>,
 Geert Uytterhoeven <geert+renesas@...der.be>,
 Dust Li <dust.li@...ux.alibaba.com>
Subject: Re: [PATCH net-next v1] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor

On Thu, 24 Jul 2025 19:42:52 +0200, Andrew Lunn <andrew@...n.ch> wrote:
> > @@ -202,5 +202,6 @@ source "drivers/net/ethernet/wangxun/Kconfig"
> >  source "drivers/net/ethernet/wiznet/Kconfig"
> >  source "drivers/net/ethernet/xilinx/Kconfig"
> >  source "drivers/net/ethernet/xircom/Kconfig"
> > +source "drivers/net/ethernet/alibaba/Kconfig"
>
> This file is sorted. Please insert in the correct location.
>
> > +struct eea_aq_host_info_cfg {
> > +#ifndef EEA_OS_DISTRO
> > +#define EEA_OS_DISTRO		0
> > +#endif
> > +
> > +#ifndef EEA_DRV_TYPE
> > +#define EEA_DRV_TYPE		0
> > +#endif
>
> Please remove these #ifndef. You know they are not defined, because
> they are not defined.
>
> > +int eea_adminq_config_host_info(struct eea_net *enet)
> > +{
> > +	struct device *dev = enet->edev->dma_dev;
> > +	struct eea_aq_host_info_cfg *cfg;
> > +	struct eea_aq_host_info_rep *rep;
> > +	int rc = -ENOMEM;
> > +
> > +	cfg = kzalloc(sizeof(*cfg), GFP_KERNEL);
> > +	if (!cfg)
> > +		return rc;
> > +
> > +	rep = kzalloc(sizeof(*rep), GFP_KERNEL);
> > +	if (!rep)
> > +		goto free_cfg;
> > +
> > +	cfg->os_type = EEA_OS_LINUX;
> > +	cfg->os_dist = EEA_OS_DISTRO;
> > +	cfg->drv_type = EEA_DRV_TYPE;
> > +
> > +	if (sscanf(utsname()->release, "%hu.%hu.%hu", &cfg->kern_ver_major,
> > +		   &cfg->kern_ver_minor, &cfg->kern_ver_sub_minor) != 3) {
> > +		rc = -EINVAL;
> > +		goto free_rep;
> > +	}
> > +
>
> LINUX_VERSION_MAJOR 6
> LINUX_VERSION_PATCHLEVEL 11
> LINUX_VERSION_SUBLEVEL 0
>
> I still think you should not be doing this.

Please give us a chance to try.

>
> > +static int eea_netdev_stop(struct net_device *netdev)
> > +{
> > +	struct eea_net *enet = netdev_priv(netdev);
> > +
> > +	if (!enet->started) {
> > +		netdev_warn(netdev, "eea netdev stop: but dev is not started.\n");
> > +		return 0;
> > +	}
>
> How does this happen?

This function can be called from other contexts.

When we receive an HA interrupt, it may indicate that there is an error inside
the device, and this function may be called as a result.

>
> > +static int eea_netdev_open(struct net_device *netdev)
> > +{
> > +	struct eea_net *enet = netdev_priv(netdev);
> > +	struct eea_net_tmp tmp = {};
> > +	int err;
> > +
> > +	if (enet->link_err) {
> > +		netdev_err(netdev, "netdev open err, because link error: %d\n",
> > +			   enet->link_err);
>
> What is a link error? You should be able to admin an interface up if
> the cable is not plugged in.

The device may send an interrupt to the driver, and then the driver can query
the device. If there is an error inside the device, the driver will stop
the device. If the user tries to bring up the network device, we will
attempt to prevent that operation.

>
> > +static int eea_netdev_init_features(struct net_device *netdev,
> > +				    struct eea_net *enet,
> > +				    struct eea_device *edev)
> > +{
> > +	eea_update_cfg(enet, edev, cfg);
>
> > +	netdev->min_mtu = ETH_MIN_MTU;
> > +
> > +	mtu = le16_to_cpu(cfg->mtu);
> > +	if (mtu < netdev->min_mtu) {
> > +		dev_err(edev->dma_dev, "device MTU too small. %d < %d", mtu,
> > +			netdev->min_mtu);
>
> This message does not really make it clear the firmware is completely
> broken, and passing invalid values to Linux. I think it is good not to
> trust the firmware, but when the firmware is broken, you probably
> should make that clear to the user.
>
> > +static int __eea_pci_probe(struct pci_dev *pci_dev,
> > +			   struct eea_pci_device *ep_dev);
> > +static void __eea_pci_remove(struct pci_dev *pci_dev, bool flush_ha_work);
>
> No forward declarations. Put the code in the correct order so they are
> not needed.

Here A calls B, B calls C, and C calls A. Please believe me, I don't like this
approach either, but this is the simplest way and allows us to keep the related
code together.

>
> > +static irqreturn_t eea_pci_ha_handle(int irq, void *data)
> > +{
> > +	struct eea_device *edev = data;
> > +
> > +	schedule_work(&edev->ep_dev->ha_handle_work);
> > +
> > +	return IRQ_HANDLED;
> > +}
>
> A threaded interrupt handler might make this simpler.

In our case, the HA interrupt is mainly used to notify some abnormal conditions
or errors from the device side. I think this is a low-probability event. My
understanding is that threaded interrupt hqandler needs to create a thread in
advance, which I consider to be a waste, so I didn't choose to do it this way.

Thanks.

>
> 	Andrew
>