lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210406102207.0000485c@intel.com>
Date:   Tue, 6 Apr 2021 10:22:07 -0700
From:   Jesse Brandeburg <jesse.brandeburg@...el.com>
To:     Nitesh Narayan Lal <nitesh@...hat.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Robin Murphy <robin.murphy@....com>,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        frederic@...nel.org, juri.lelli@...hat.com, abelits@...vell.com,
        bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        rostedt@...dmis.org, mingo@...nel.org, peterz@...radead.org,
        davem@...emloft.net, akpm@...ux-foundation.org,
        sfr@...b.auug.org.au, stephen@...workplumber.org,
        rppt@...ux.vnet.ibm.com, jinyuqi@...wei.com,
        zhangshaokun@...ilicon.com
Subject: Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to
 houskeeping CPUs

Continuing a thread from a bit ago...

Nitesh Narayan Lal wrote:

> > After a little more digging, I found out why cpumask_local_spread change
> > affects the general/initial smp_affinity for certain device IRQs.
> >
> > After the introduction of the commit:
> >
> >     e2e64a932 genirq: Set initial affinity in irq_set_affinity_hint()
> >
> 
> Continuing the conversation about the above commit and adding Jesse.
> I was trying to understand the problem that the commit message explains
> "The default behavior of the kernel is somewhat undesirable as all
> requested interrupts end up on CPU0 after registration.", I have also been
> trying to reproduce this behavior without the patch but I failed in doing
> so, maybe because I am missing something here.
> 
> @Jesse Can you please explain? FWIU IRQ affinity should be decided based on
> the default affinity mask.

The original issue as seen, was that if you rmmod/insmod a driver
*without* irqbalance running, the default irq mask is -1, which means
any CPU. The older kernels (this issue was patched in 2014) used to use
that affinity mask, but the value programmed into all the interrupt
registers "actual affinity" would end up delivering all interrupts to
CPU0, and if the machine was under traffic load incoming when the
driver loaded, CPU0 would start to poll among all the different netdev
queues, all on CPU0.

The above then leads to the condition that the device is stuck polling
even if the affinity gets updated from user space, and the polling will
continue until traffic stops.

> The problem with the commit is that when we overwrite the affinity mask
> based on the hinting mask we completely ignore the default SMP affinity
> mask. If we do want to overwrite the affinity based on the hint mask we
> should atleast consider the default SMP affinity.

Maybe the right thing is to fix which CPUs are passed in as the valid
mask, or make sure the kernel cross checks that what the driver asks
for is a "valid CPU"?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ