linux-kernel - RE: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts based on allocated IRQs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <MWHPR2101MB0729643A3EB596B3AF701C43CECE0@MWHPR2101MB0729.namprd21.prod.outlook.com>
Date:   Thu, 1 Nov 2018 16:39:18 +0000
From:   Long Li <longli@...rosoft.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     Michael Kelley <mikelley@...rosoft.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts
 based on allocated IRQs

> Subject: Re: [Patch v2] genirq/matrix: Choose CPU for assigning interrupts
> based on allocated IRQs
> 
> Long,
> 
> On Thu, 1 Nov 2018, Long Li wrote:
> > On a large system with multiple devices of the same class (e.g. NVMe
> > disks, using managed IRQs), the kernel tends to concentrate their IRQs
> > on several CPUs.
> >
> > The issue is that when NVMe calls irq_matrix_alloc_managed(), the
> > assigned CPU tends to be the first several CPUs in the cpumask,
> > because they check for
> > cpumap->available that will not change after managed IRQs are reserved.
> >
> > In irq_matrix->cpumap, "available" is set when IRQs are allocated
> > earlier in the IRQ allocation process. This value is caculated based
> > on
> 
> calculated
> 
> > 1. how many unmanaged IRQs are allocated on this CPU 2. how many
> > managed IRQs are reserved on this CPU
> >
> > But "available" is not accurate in accouting the real IRQs load on a given CPU.
> >
> > For a managed IRQ, it tends to reserve more than one CPU, based on
> > cpumask in irq_matrix_reserve_managed. But later when actually
> > allocating CPU for this IRQ, only one CPU is allocated. Because
> > "available" is calculated at the time managed IRQ is reserved, it
> > tends to indicate a CPU has more IRQs than it's actually assigned.
> >
> > When a managed IRQ is assigned to a CPU in irq_matrix_alloc_managed(),
> > it decreases "allocated" based on the actually assignment of this IRQ to this
> CPU.
> 
> decreases?
> 
> > Unmanaged IRQ also decreases "allocated" after allocating an IRQ on this
> CPU.
> 
> ditto
> 
> > For this reason, checking "allocated" is more accurate than checking
> > "available" for a given CPU, and result in a more evenly distributed
> > IRQ across all CPUs.
> 
> Again, this approach is only correct for managed interrupts. Why?
> 
> Assume that total vector space size  = 10
> 
> CPU 0:
>        allocated	=  8
>        available	=  1
> 
>        i.e. there are 2 managed reserved, but not assigned interrupts
> 
> CPU 1:
>        allocated	=  7
>        available	=  0
> 
>        i.e. there are 3 managed reserved, but not assigned interrupts
> 
> Now allocate a non managed interrupt:
> 
> irq_matrix_alloc()
> 
> 	cpu = find_best_cpu() <-- returns CPU1
> 
> 	---> FAIL
> 
> The allocation fails because it cannot allocate from the managed reserved
> space. The managed reserved space is guaranteed even if the vectors are not
> assigned. This is required to make hotplug work and to allow late activation
> without breaking the guarantees.
> 
> Non managed has no guarantees, it's a best effort approach, so it can fail.
> But the fail above is just wrong.
> 
> You really need to treat managed and unmanaged CPU selection differently.

Thank you for the explanation. I will send another patch to do it properly.

Long

> 
> Thanks,
> 
> 	tglx