lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0b96edcc-6b5f-447f-8023-440427a9fff2@leemhuis.info>
Date: Fri, 5 Jul 2024 18:50:40 +0200
From: "Linux regression tracking (Thorsten Leemhuis)"
 <regressions@...mhuis.info>
To: torvalds@...ux-foundation.org
Cc: davem@...emloft.net, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, pabeni@...hat.com,
 Linux kernel regressions list <regressions@...ts.linux.dev>,
 Jakub Kicinski <kuba@...nel.org>
Subject: e1000e regressions reg. suspend and resume (was: Re: [GIT PULL]
 Networking for v6.10-rc7)

On 04.07.24 17:33, Jakub Kicinski wrote:
> 
> The following changes since commit fd19d4a492af77b1e8fb0439781a3048d1d1f554:
> 
>   Merge tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2024-06-27 10:05:35 -0700)
> 
> [...] 
>
> There's one fix for power management with Intel's e1000e here,
> Thorsten tells us there's another problem that started in v6.9.
> We're trying to wrap that up but I don't think it's blocking.

Linus, in the scope of the topics I recently brought up on the ksummit
list I'd really like to know how you feel about the particular situation
Jakub hinted at avove, as I wonder if you would have preferred to see
the culprits reverted weeks ago.

I agree with Jakub that the problem might not qualify as "blocking", as
it seems to only affect users with certain ethernet chips. But OTOH it's
not one, but two stacked regressions -- and one is in proper releases
for a few weeks already now. And both afaics could have been solved
weeks ago by quick reverts (while reintroducing an old(?) problem the
first of the two culprits tried to fix); the author of the second
culprit even submitted a revert weeks ago and suggested to revert the
other change, too.

That was the long story short, here are the details.

The first culprit is 861e8086029e00 ("e1000e: move force SMBUS from
enable ulp function to avoid PHY loss issue") [v6.9-rc3, v6.8.5,
v6.6.26]. Due to it ethernet after a suspend and resume did not work
anymore for some users. This is something that bothers people, as
https://lore.kernel.org/all/ZmfcJsyCB6M3wr84@pirotess/ shows.

This regression was something the second culprit bfd546a552e140
("e1000e: move force SMBUS near the end of enable_ulp function")
[v6.10-rc2] tried to fix. Since two days after that rc was out it's
known that this change causes some systems to not even enter suspend.
For details see https://bugzilla.kernel.org/show_bug.cgi?id=218936 and
https://bugzilla.kernel.org/show_bug.cgi?id=218940 . Side note: commit
bfd546a552e140 nearly entered stable kernels as well, but I told Greg
about the problem, who then decided to wait:
https://lore.kernel.org/all/2024061406-refreeze-flatfoot-f33a@gregkh/

It quickly became known that both regression can be fixed with reverts;
the author of bfd546a552e140 even submitted one and suggested to revert
861e8086029e00 as well:
https://lore.kernel.org/all/20240610013222.12082-1-hui.wang@canonical.com/
https://lore.kernel.org/all/20240611062416.16440-1-hui.wang@canonical.com/

But another developer wanted to fix the root cause. The last version of
the patch to do so is from 2024-06-20 afaics:
https://lore.kernel.org/all/20240620063645.4151337-1-vitaly.lifshits@intel.com/
The discussion about it stalled until I pointed the -net maintainers to
it two days ago in private, as afterwards there was one more reply.

All that makes me wonder if both commits should have been reverted in
mainline weeks ago; yes, sure, the problem that 861e8086029e00 tried to
fix would be back. But it's Fixes: tag points to a change to 4.2-rc1, so
maybe that would not be that bad (hard to say without knowing more about
what motivated the development of that change).

That way Greg then could have reverted 861e8086029e00 as well to resolve
this in 6.9.y and 6.6.y (the latter contains this commit since
2024-04-10 and thus likely also shows the regression that bfd546a552e140
was meant to fix).

Ciao, Thorsten

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ