[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <161e69d8-eb8e-4a5d-9b4e-875fa6253c67@lunn.ch>
Date: Thu, 17 Jul 2025 19:27:03 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>
Cc: netdev@...r.kernel.org, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Wen Gu <guwen@...ux.alibaba.com>,
Philo Lu <lulie@...ux.alibaba.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Lukas Bulwahn <lukas.bulwahn@...hat.com>,
Parthiban Veerasooran <Parthiban.Veerasooran@...rochip.com>,
Geert Uytterhoeven <geert+renesas@...der.be>,
Alexander Duyck <alexanderduyck@...com>,
Dust Li <dust.li@...ux.alibaba.com>
Subject: Re: [PATCH net-next] eea: Add basic driver framework for Alibaba
Elastic Ethernet Adaptor
> > That is not a very good explanation. Do you see any other system in
> > Linux were the firmware works around bug in Linux drivers using the
> > kernel version?
>
> Actually, there is one, we noticed that the ena driver has a similar mechanism.
>
> struct ena_admin_host_info
>
> >
> > You also need to think about enterprise kernels, like RedHat,
> > Oracle. They don't give a truthful kernel version, they have thousands
> > of patches on top fixing, and creating bugs. How will you handle that?
> >
> > Please drop all this, and just fix the bugs in the driver.
>
>
> Fixing bugs in Linux is, of course, the necessary work. However, if certain bugs
> already exist and customers are using such drivers, there is a risk involved. We
> can record these buggy versions in the DPU, and notify users via dmesg when they
> initialize the driver.
This then references the next point. What does 5.4.296 actually mean?
It is mainline 5.4.296? Is it Debian 5.4.296 with just a few patches
on top? Is it Redhat with 1000s of patches on top? Is it a vendor
patch which broke it, or is mainline broken? If the vendor broke it,
are you going to apply workarounds in your DPU for mainline which is
not broken? Does you DPU tell the world it is applying a workaround,
so somebody trying to debug the issue knows the DPU is working against
them?
As you pointed out, there might be one driver amongst hundreds which
reports the kernel version to the firmware. Does ENA actually do
anything with it? I don't know. But since less an 1% of drivers
actually do this, it cannot be a useful feature, because others would
already be do it.
> However, once we've identified the problem, we would prefer for the operation to
> time out and exit, so that we can reload the new .ko module. In this process, we
> may adjust the module parameters to reduce the originally large timeout value,
> forcing it to exit faster. This use case is actually very helpful during our
> development process and significantly improves our efficiency.
No module parameters. You are doing development work, just use $EDITOR
and change the timeout.
> > So you will be submitting a patch for GregKH for every single stable
> > kernel? That will be around 5 patches, every two weeks, for the next
> > 30 years?
>
> Of course we won't be doing that. Our plan is that whenever we update the code
> — for example, fixing a bug and updating the version from 1.0.0 to 1.0.1, or
> introducing a new feature and bumping the version to 1.0.2 — then when this
> change is backported to stable releases, the version should also be backported
> accordingly.
So the version is useless. This has long been agreed, and we have been
NACKing such versions for years.
Andrew
Powered by blists - more mailing lists