[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9647487-28db-d138-ae3e-3fd0d2fbe589@gmail.com>
Date: Fri, 28 Dec 2018 07:34:10 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Anna-Maria Gleixner <anna-maria@...utronix.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Grygorii Strashko <grygorii.strashko@...com>
Subject: Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may
have revealed another problem
On 28.12.2018 02:31, Frederic Weisbecker wrote:
> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>
[...]
>
> Interesting, the softirq is raised from hardirq but it's not handled in the end of
> the IRQ. Are you running threaded IRQS by any chance? If so I would expect ksoftirqd
> to handle the pending work before we go idle. However I can imagine a small window
> where such an expectation may not be met: if the softirq is raised after the ksoftirqd
> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we disable the CPU
> (CPUHP_TEARDOWN_CPU).
>
I have a network driver (r8169) using NAPI which runs in softirq context AFAIK.
For testing purposes I sometimes trigger system suspend via network, so there is
network adapter activity when system suspends. Apart from that nothing really
exciting:
CPU0 CPU1 CPU2 CPU3
0: 43 0 0 0 IO-APIC 2-edge timer
1: 4 0 0 0 IO-APIC 1-edge i8042
8: 0 1 0 0 IO-APIC 8-fasteoi rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 0 0 0 5 IO-APIC 12-edge i8042
120: 0 0 0 0 PCI-MSI 311296-edge PCIe PME
121: 0 0 0 0 PCI-MSI 315392-edge PCIe PME
122: 0 0 0 0 PCI-MSI 327680-edge PCIe PME
123: 0 0 3328 0 PCI-MSI 294912-edge ahci[0000:00:12.0]
124: 0 133 0 0 PCI-MSI 344064-edge xhci_hcd
125: 0 0 32 0 PCI-MSI 245760-edge mei_me
127: 381 0 0 0 PCI-MSI 1572864-edge enp3s0
128: 0 0 0 236 PCI-MSI 32768-edge i915
129: 0 374 0 0 PCI-MSI 229376-edge snd_hda_intel:card0
> I don't know if we can afford to ignore a softirq even at this late stage. We should
> probably avoid leaking any. So here is a possible fix, if you don't mind trying:
>
I tested your patch and at least in the first minutes of testing couldn't reproduce
the issue any longer. I tested manual system suspend and the following script you
sent when we started to analyze the issue.
Heiner
--------------------------------------------------------------------------
#!/bin/bash
do_hotplug()
{
for i in $(seq 1 $2)
do
echo $1 > /sys/devices/system/cpu/cpu$i/online
done
}
LAST_CPU=$(($(nproc)-1))
while true
do
do_hotplug 0 $LAST_CPU
do_hotplug 1 $LAST_CPU
done
Powered by blists - more mailing lists