linux-kernel - Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250617100013.1o5lsPLq@linutronix.de>
Date: Tue, 17 Jun 2025 12:00:13 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Marc Strämke <marc.straemke@...ropuls.de>
Cc: linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org
Subject: Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel?
 IGB

On 2025-06-17 11:45:52 [+0200], Marc Strämke wrote:
> Hi Sebastian,
Hi Marc,

> On 17.06.25 11:28, Sebastian Andrzej Siewior wrote:
> > Between those two functions you have 800us delay. Interrupts are not
> > disabled so the CPU stalls. As explained earlier, I expect the read on
> > the bus flushes the writes causing the spike.
> > 
> So the delay you think is really the hardware(CPU) being stalled on the bus
> for so long? Or do you mean that this is the reason for the long runtime of
Yes.

> the IGB function only?
Both. The bus is flushed, the CPU stalls until the end and then the
function takes long _and_ the due to the stall the interrupt can not
fire any sooner.

> Shouldn't the other core (it is 2 core machine) still be able to handle the
> timer interrupt then? (I did some testing with isolating IP on one core and

You have two cores and two cyclictest threads - one on each CPU. This is
explicit per-CPU. No migration between those two cores. Timers for CPU0
are always handled by CPU0.
Even if CPU1 would handle CPU0's timers then it would wake cyclictest on
CPU0 but that thread would have to wake until CPU0 is done with the PCI
bus. CPU1 knows nothing about it.

> cyclictest on the other, I am not sure if I moved the timer IRQ to the
> second core)

> My mistake was to search for where the Interrupt was being disabled in
> Kernelspace (I did not think that the hardware could introduce and IRQ delay
> for 800us..)

Yeah. 

> Marc

Sebastian