lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1811211419260.1665@nanos.tec.linutronix.de>
Date:   Wed, 21 Nov 2018 14:26:20 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Josh Hunt <johunt@...mai.com>
cc:     saeedm@...lanox.com, linux-kernel@...r.kernel.org,
        "Ozen, Gurhan" <guozen@...mai.com>
Subject: Re: vector space exhaustion on 4.14 LTS kernels

Josh,

On Mon, 19 Nov 2018, Josh Hunt wrote:
> We have a class of machines that appear to be exhausting the vector space on
> cpus 0 and 1 which causes some breakage later on when trying to set the
> affinity. The boxes are running the 4.14 LTS kernel.
> 
> [   39.531385] __assign_irq_vector: irq:512 cpu:128 searched:00,00000001
> vector:00,00000000 continue
> [   39.531386] apic_set_affinity: irq:512 mask:00,00000001 err:-28
> 
> The affinity values:
> 
> root@....25.48.208:/proc/irq/512# grep . *
> affinity_hint:00,00000001
> effective_affinity:00,00000004
> effective_affinity_list:2
> grep: mlx5_comp0@pci:0000:65:00.1: Is a directory
> node:0
> smp_affinity:ff,ffffffff
> smp_affinity_list:0-39
> spurious:count 3
> spurious:unhandled 0
> spurious:last_unhandled 0 ms
> 
> I noticed your change, a0c9259dc4e1 "irq/matrix: Spread interrupts on
> allocation", and this sounds like what we're hitting. Booting 4.19 does not
> have this problem. I haven't booted 4.15 yet, but can do it to confirm the
> above commit is what resolves this.

Might be, but in 4.15 the while vector allocation got rewritten. One of the
reasons was the exhaustion issue. Some of that is caused by massive over
allocation by certain device drivers. The new allocator mechanism handles
that way better.

> Since 4.14 doesn't have the matrix allocator it's not a trivial backport. I
> was wondering a) if you agree with my assessment and b) if there's any plans
> on resolving this on the 4.14 allocator? If not I can attempt to backport the
> idea to 4.14 to spread the interrupts around on allocation.

No plans. Good luck with trying to fix that on the 4.14 code. I'd recommend
to switch to 4.19 LTS :)

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ