linux-kernel - Re: This is the fourth time I’ve tried to find what led to the regression of outgoing network speed and each time I find the merge commit 8c94ccc7cd691472461448f98e2372c75849406c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2d87509a-1515-520c-4b9e-bba4cd4fa2c6@linux.intel.com>
Date: Wed, 7 Feb 2024 12:40:57 +0200
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
Cc: "Christian A. Ehrhardt" <lk@...e.de>, niklas.neronin@...ux.intel.com,
 Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
 Greg KH <gregkh@...uxfoundation.org>, linux-usb@...r.kernel.org
Subject: Re: This is the fourth time I’ve tried to find what led to the regression of outgoing network speed and each time I find the merge commit 8c94ccc7cd691472461448f98e2372c75849406c

On 6.2.2024 18.12, Mikhail Gavrilov wrote:
> On Tue, Feb 6, 2024 at 4:24 PM Mathias Nyman
> <mathias.nyman@...ux.intel.com> wrote:
> 
> I confirm after reverting all listed commits and 57e153dfd0e7
> performance of the network returned to theoretical maximum.
> 
>> That patch changes how we request MSI/MSI-X interrupt(s) for xhci.
>>
>> Is there any change is /proc/interrupts between a good and bad case?
>> Such as xhci_hcd using MSI-X instead of MSI, or eth0 and xhci_hcd
>> interrupting on the same CPU?
> 
> On the good kernel I have - 32 xhci_hcd, and bad only - 4.
> In both scenarios using PCI-MSIX.
> I attached both interrupt output as archives to this message.
> 

Thanks,

Looks like your network adapter ends up interrupting CPU0 in the bad case due
to the change in how many interrupts are requested by xhci_hcd before it.

bad case:
	CPU0	CPU1	...	CPU31
87:	18213809 0	... 	0	IR-PCI-MSIX-0000:0e:00.0    0-edge      enp14s0

Does manually changing it to some other CPU help? picking one that doesn't already
handle a lot of interrupts. CPU0 could also in general be more busy, possibly spending
more time with interrupts disabled.

For example change to CPU23 in the bad case:

echo 800000 > /proc/irq/87/smp_affinity

Check from proc/interrupts that enp14s0 interrupts actually go to CPU23 after this.

Thanks
Mathias