lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0552842-7824-424e-af21-ac9eb3c5c14d@kernel.org>
Date: Wed, 21 Aug 2024 07:09:20 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Bjorn Helgaas <helgaas@...nel.org>, Petr Valenta <petr@...klidu.cz>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, Len Brown <lenb@...nel.org>,
 "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
 Linux kernel mailing list <linux-kernel@...r.kernel.org>,
 Linux regressions mailing list <regressions@...ts.linux.dev>,
 Tony Nguyen <anthony.l.nguyen@...el.com>, przemyslaw.kitszel@...el.com,
 intel-wired-lan@...ts.osuosl.org, "Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: ACPI IRQ storm with 6.10

On 20. 08. 24, 23:30, Bjorn Helgaas wrote:
> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote:
>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a):
>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote:
>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote:
>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b)
>>>>
>>>> Bjorn,
>>>>
>>>> I am confused by these changes:
>>>> ==========================================
>>>> @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct net_device
>>>> *net
>>>> dev,
>>>>            * duplex is forced.
>>>>            */
>>>>           if (cmd->base.eth_tp_mdix_ctrl) {
>>>> -               if (hw->phy.media_type != e1000_media_type_copper) {
>>>> -                       ret_val = -EOPNOTSUPP;
>>>> -                       goto out;
>>>> -               }
>>>> +               if (hw->phy.media_type != e1000_media_type_copper)
>>>> +                       return -EOPNOTSUPP;
>>>>
>>>>                   if ((cmd->base.eth_tp_mdix_ctrl != ETH_TP_MDI_AUTO) &&
>>>>                       (cmd->base.autoneg != AUTONEG_ENABLE)) {
>>>>                           e_err("forcing MDI/MDI-X state is not supported when
>>>> lin
>>>> k speed and/or duplex are forced\n");
>>>> -                       ret_val = -EINVAL;
>>>> -                       goto out;
>>>> +                       return -EINVAL;
>>>>                   }
>>>>           }
>>>>
>>>> @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct net_device
>>>> *netde
>>>> v,
>>>>           }
>>>>
>>>>    out:
>>>> -       pm_runtime_put_sync(netdev->dev.parent);
>>>>           clear_bit(__E1000_RESETTING, &adapter->state);
>>>>           return ret_val;
>>>>    }
>>>> ==========================================
>>>>
>>>> So no more clear_bit(__E1000_RESETTING in the above fail paths. Is that
>>>> intentional?
>>>
>>> Not intentional.  Petr, do you have the ability to test the patch
>>> below?  I'm not sure it's the correct fix, but it reverts the pieces
>>> of b2c289415b2b that Jiri pointed out.
>>
>> I tested the patch below but it didn't help. After the first boot with new
>> kernel it looked promising as the irq storm only appeared for a few seconds,
>> but with subsequent reboots it was the same as without the patch.
> 
> Thank you very much for testing that!


>> To be sure, I also send the md5sum of ethtool.c after applying the patch:
>>
>> a25c003257538f16994b4d7c4260e894 ethtool.c
> 
> Thanks, that matches what I get when applying the patch on v6.10.
> 
> I'm at a loss.  You could try reverting the entire b2c289415b2b commit
> (patch for that is below).

FWIW he already tested with b2c289415b2b reverted (I provided him with a 
built kernel). It behaves the same. So you are not the breaker.

> If that doesn't help, I guess you could try reverting the other
> commits Jiri mentioned:
> 
>    76a0a3f9cc2f e1000e: fix force smbus during suspend flow
>    c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems
>    bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function
>    6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates
>    1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu()
>    b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops
>    75a3f93b5383 net: intel: implement modern PM ops declarations
> 
> If you do this, I would revert 76a0a3f9cc2f, test, then revert
> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition,
> etc.

Or perhaps easier to do:
   git bisect v6.10 v6.9 -- drivers/net/ethernet/intel/e1000e/
directly. But that assumes one of the above commits broke it. If they 
did not, as a last resort, you can still do full bisect (without the "-- 
drivers" part).

I would take v6.10 suses config.
Would boot 6.10.
do lsmod > /tmp/lsmod
make LSMOD=/tmp/lsmod localyesconfig
make bzImage
and use that bzImage.

Note that graphics, wireless and other stuff will be defunct unless you 
build in firmwares for them (EXTRA_FIRMWARE config). Alternatively use 
localmodconfig and build and install also modules (now limited to your 
machine).

thanks,
-- 
js
suse labs


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ