lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190111213601.GA18741@lerouge>
Date:   Fri, 11 Jan 2019 22:36:02 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Heiner Kallweit <hkallweit1@...il.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Anna-Maria Gleixner <anna-maria@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Grygorii Strashko <grygorii.strashko@...com>
Subject: Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may
 have revealed another problem

On Wed, Jan 09, 2019 at 11:20:50PM +0100, Heiner Kallweit wrote:
> On 28.12.2018 07:39, Heiner Kallweit wrote:
> > On 28.12.2018 07:34, Heiner Kallweit wrote:
> >> On 28.12.2018 02:31, Frederic Weisbecker wrote:
> >>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
> >>>>
> >> [...]
> >>>
> >>> Interesting, the softirq is raised from hardirq but it's not handled in the end of
> >>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
> >>> to handle the pending work before we go idle. However I can imagine a small window
> >>> where such an expectation may not be met: if the softirq is raised after the ksoftirqd
> >>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
> >>> (CPUHP_TEARDOWN_CPU).
> >>>
> >> I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
> >> For testing purposes I sometimes trigger system suspend via network, so there is
> >> network adapter activity when system suspends. Apart from that nothing really
> >> exciting:
> >>             CPU0       CPU1       CPU2       CPU3
> >>    0:         43          0          0          0   IO-APIC    2-edge      timer
> >>    1:          4          0          0          0   IO-APIC    1-edge      i8042
> >>    8:          0          1          0          0   IO-APIC    8-fasteoi   rtc0
> >>    9:          0          0          0          0   IO-APIC    9-fasteoi   acpi
> >>   12:          0          0          0          5   IO-APIC   12-edge      i8042
> >>  120:          0          0          0          0   PCI-MSI 311296-edge      PCIe PME
> >>  121:          0          0          0          0   PCI-MSI 315392-edge      PCIe PME
> >>  122:          0          0          0          0   PCI-MSI 327680-edge      PCIe PME
> >>  123:          0          0       3328          0   PCI-MSI 294912-edge      ahci[0000:00:12.0]
> >>  124:          0        133          0          0   PCI-MSI 344064-edge      xhci_hcd
> >>  125:          0          0         32          0   PCI-MSI 245760-edge      mei_me
> >>  127:        381          0          0          0   PCI-MSI 1572864-edge      enp3s0
> >>  128:          0          0          0        236   PCI-MSI 32768-edge      i915
> >>  129:          0        374          0          0   PCI-MSI 229376-edge      snd_hda_intel:card0
> >>
> >>> I don't know if we can afford to ignore a softirq even at this late stage. We should
> >>> probably avoid leaking any. So here is a possible fix, if you don't mind trying:
> >>>
> >> I tested your patch and at least in the first minutes of testing couldn't reproduce
> >> the issue any longer. I tested manual system suspend and the following script you
> >> sent when we started to analyze the issue.
> >>
> > 
> > Also after some more time the issue didn't occur again. So it seems your analysis
> > was right and also the approach to fix it. Thanks!
> > Will let you know in case the issue should pop up again under special
> > circumstances.
> > 
> Frederic, so far this fix didn't appear in linux-next, are you going to submit it?

Yep, I'll cook up a proper changelog and let Thomas judge if the change is worth.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ