[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1752733075.7055798-1-xuanzhuo@linux.alibaba.com>
Date: Thu, 17 Jul 2025 14:17:55 +0800
From: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: netdev@...r.kernel.org,
Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Wen Gu <guwen@...ux.alibaba.com>,
Philo Lu <lulie@...ux.alibaba.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Lukas Bulwahn <lukas.bulwahn@...hat.com>,
Parthiban Veerasooran <Parthiban.Veerasooran@...rochip.com>,
Geert Uytterhoeven <geert+renesas@...der.be>,
Alexander Duyck <alexanderduyck@...com>,
Dust Li <dust.li@...ux.alibaba.com>
Subject: Re: [PATCH net-next] eea: Add basic driver framework for Alibaba Elastic Ethernet Adaptor
On Wed, 16 Jul 2025 18:25:40 +0200, Andrew Lunn <andrew@...n.ch> wrote:
> On Wed, Jul 16, 2025 at 01:47:32PM +0800, Xuan Zhuo wrote:
> > Thank you for your valuable feedback. We've addressed most of the comments
> > and will include the fixes in the next version. A few remaining items are still
> > under discussion and listed below for reference.
> >
> > On Thu, 10 Jul 2025 15:45:38 +0200, Andrew Lunn <andrew@...n.ch> wrote:
> > > > +module_param(aq_timeout, uint, 0644);
> > >
> > > No module params please.
> > >
> > > > +struct eea_aq_host_info_cfg {
> > > > +#ifndef EEA_OS_DISTRO
> > > > +#define EEA_OS_DISTRO 0
> > > > +#endif
> > > > +
> > > > +#ifndef EEA_DRV_TYPE
> > > > +#define EEA_DRV_TYPE 0
> > > > +#endif
> > > > +
> > > > +#define EEA_OS_LINUX 1
> > > > +#define EEA_SPEC_VER_MAJOR 1
> > > > +#define EEA_SPEC_VER_MINOR 0
> > > > + __le16 os_type; /* Linux, Win.. */
> > > > + __le16 os_dist;
> > > > + __le16 drv_type;
> > > > +
> > > > + __le16 kern_ver_major;
> > > > + __le16 kern_ver_minor;
> > > > + __le16 kern_ver_sub_minor;
> > > > +
> > > > + __le16 drv_ver_major;
> > > > + __le16 drv_ver_minor;
> > > > + __le16 drv_ver_sub_minor;
> > > > +
> > > > + __le16 spec_ver_major;
> > > > + __le16 spec_ver_minor;
> > > > + __le16 pci_bdf;
> > > > + __le32 pci_domain;
> > > > +
> > > > + u8 os_ver_str[64];
> > > > + u8 isa_str[64];
> > >
> > > Why does it care about the OS, kernel version etc?
> >
> > Then the device can know the version, the dpu can do something for bug of the
> > driver.
>
> That is not a very good explanation. Do you see any other system in
> Linux were the firmware works around bug in Linux drivers using the
> kernel version?
Actually, there is one, we noticed that the ena driver has a similar mechanism.
struct ena_admin_host_info
>
> You also need to think about enterprise kernels, like RedHat,
> Oracle. They don't give a truthful kernel version, they have thousands
> of patches on top fixing, and creating bugs. How will you handle that?
>
> Please drop all this, and just fix the bugs in the driver.
Fixing bugs in Linux is, of course, the necessary work. However, if certain bugs
already exist and customers are using such drivers, there is a risk involved. We
can record these buggy versions in the DPU, and notify users via dmesg when they
initialize the driver. This way, customers will be aware that the current
version might have issues, and they can be guided to upgrade their kernel.
Version-related issues are indeed quite tricky, and we do encounter various
complications because of them. Therefore, our plan is to manually update our
version tracking with every code change, combined with some additional
information.
In the end, if we cannot confidently determine the situation, we will simply
choose to do nothing — after all, it won't make things any worse than they
already are.
>
> > > > + start = get_jiffies_64();
> > > > + while (!(cdesc = ering_cq_get_desc(enet->adminq.ring))) {
> > > > + cond_resched();
> > > > + cpu_relax();
> > > > +
> > > > + timeout = secs_to_jiffies(READ_ONCE(aq_timeout));
> > > > + if (time_after64(get_jiffies_64(), start + timeout)) {
> > > > + netdev_err(enet->netdev, "admin queue timeout. timeout %d\n",
> > > > + READ_ONCE(aq_timeout));
> > > > + return -1;
> > > > + }
> > > > + }
> > >
> > > See if you can one of the macros from iopoll.h
> >
> > Here we do not access the pci register directly, if we use the iopoll.h
> > we need to break the api ering_cq_get_desc. So I think we should not use
> > the api of iopoll.h here.
>
> #define read_poll_timeout(op, val, cond, sleep_us, timeout_us, \
> sleep_before_read, args...) \
> ({ \
> u64 __timeout_us = (timeout_us); \
> unsigned long __sleep_us = (sleep_us); \
> ktime_t __timeout = ktime_add_us(ktime_get(), __timeout_us); \
> might_sleep_if((__sleep_us) != 0); \
> if (sleep_before_read && __sleep_us) \
> usleep_range((__sleep_us >> 2) + 1, __sleep_us); \
> for (;;) { \
> (val) = op(args); \
> if (cond) \
> break; \
>
>
> op: ering_cq_get_desc.
> val: cdesc.
> args: enet->adminq.ring
> cond: !cdesc
>
> I might be wrong, but i think you can make this work.
Ohhh, You're right. I hadn't thought through this issue in detail before, and
yes, doing it this way is indeed feasible. During testing in certain scenarios
— especially in model environments — the execution speed can be quite slow, so
we sometimes set a larger timeout value to accommodate that.
However, once we've identified the problem, we would prefer for the operation to
time out and exit, so that we can reload the new .ko module. In this process, we
may adjust the module parameters to reduce the originally large timeout value,
forcing it to exit faster. This use case is actually very helpful during our
development process and significantly improves our efficiency.
That's why you see me using READ_ONCE() when comparing each timeout value — to
ensure we always read the latest updated value.
>
> > > > +static void eea_get_drvinfo(struct net_device *netdev,
> > > > + struct ethtool_drvinfo *info)
> > > > +{
> > > > + struct eea_net *enet = netdev_priv(netdev);
> > > > + struct eea_device *edev = enet->edev;
> > > > +
> > > > + strscpy(info->driver, KBUILD_MODNAME, sizeof(info->driver));
> > > > + strscpy(info->bus_info, eea_pci_name(edev), sizeof(info->bus_info));
> > > > + snprintf(info->version, sizeof(info->version), "%d.%d.%d",
> > > > + EEA_VER_MAJOR, EEA_VER_MINOR, EEA_VER_SUB_MINOR);
> > >
> > > A hard coded version is pointless, because it never changes, yet the
> > > kernel around the driver changes every week. Don't set version, and
> > > the core will fill in the git hash, which is useful.
> >
> > In our plan, we will increase this version when we change the code.
>
> So you will be submitting a patch for GregKH for every single stable
> kernel? That will be around 5 patches, every two weeks, for the next
> 30 years?
Of course we won't be doing that. Our plan is that whenever we update the code
— for example, fixing a bug and updating the version from 1.0.0 to 1.0.1, or
introducing a new feature and bumping the version to 1.0.2 — then when this
change is backported to stable releases, the version should also be backported
accordingly.
>
> As i said, the driver is not standalone, it is embedded within the
> kernel. Changes to the kernel can break the driver. When you get a bug
> report from a customer, and they say version v42 of the driver is
> broken, isn't the first thing you are going to ask is what kernel
> version? Is it a vendor kernel? Which vendor?
>
> If you leave version unfilled, ethtool will report something like:
>
> version: 6.12.29-amd64
>
> which is much more useful.
I actually think the approach you mentioned — printing the hash value — is
also useful. We may want to reconsider that method as well. But in my opinion,
this doesn’t affect the current patch. We might adjust that part later.
>
> > > > + mtu = le16_to_cpu(cfg->mtu);
> > > > + if (mtu < netdev->min_mtu) {
> > > > + dev_err(edev->dma_dev, "device MTU too small. %d < %d", mtu, netdev->min_mtu);
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + netdev->mtu = mtu;
> > > > + netdev->max_mtu = mtu;
> > >
> > > Setting mtu the same as max_mtu is unusual? Are you defaulting to jumbo?
> >
> > In the cloud the dpu controls this.
>
> No, linux controls this. This driver controls this, using these line
> here. Userspace can change it later, but this is the default. And 99%
> of Linux systems default to 1500. Is max_mtu 1500? Or is it some jumbo
> value? You at last need to add a comment you are ignoring what
> everybody else does and are setting the MTU to something larger,
> because that is what your use case is.
>
> You are aiming to write an Ethenet driver which looks pretty much like
> every other Ethernet driver in Linux. When you do something different,
> you need to justify it, add a comment why your device is special and
> needs to do something different.
No problem, I'll add a comment to explain this. I didn't realize there was
anything special about it at first. However, in cloud scenarios, this issue is
relatively straightforward — it's a bit different from physical NIC drivers.
When users purchase a machine or NIC, they’ve already decided whether the
working environment uses 1500 or jumbo frames. Our hardware will return either
1500 or 9000 based on whether jumbo frames are enabled in hardware.
The reason we wrote the driver this way is to automatically configure an MTU of
9000 in jumbo frame scenarios, so that users don’t have to do much manual
configuration.
Thanks.
>
> Andrew
Powered by blists - more mailing lists