lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <598457c6-4bea-50f5-efe9-6a2af3405ff5@akamai.com>
Date:   Mon, 19 Nov 2018 14:35:07 -0800
From:   Josh Hunt <johunt@...mai.com>
To:     tglx@...utronix.de
Cc:     saeedm@...lanox.com, linux-kernel@...r.kernel.org,
        "Ozen, Gurhan" <guozen@...mai.com>
Subject: vector space exhaustion on 4.14 LTS kernels

Hi Thomas

We have a class of machines that appear to be exhausting the vector 
space on cpus 0 and 1 which causes some breakage later on when trying to 
set the affinity. The boxes are running the 4.14 LTS kernel.

I instrumented 4.14 and here's what I see:

[   28.328849] __assign_irq_vector: irq:512 cpu:0 mask:ff,ffffffff 
onlinemask:ff,ffffffff vector:0
[   28.329847] __assign_irq_vector: irq:512 cpu:2 vector:222 cfgvect:0 
off:14 old_domain:00,00000000 domain:00,00000000 
vector_search:00,00000004 update
[   28.329847] default_cpu_mask_to_apicid: irq:512 mask:00,00000004
...
[   31.729154] __assign_irq_vector: irq:512 cpu:0 mask:ff,ffffffff 
onlinemask:ff,ffffffff vector:222
[   31.729154] __assign_irq_vector: irq:512 cpu:0 mask:ff,ffffffff 
vector_cpumask:00,00000001 vector:222
...
[   31.729154] __assign_irq_vector: irq:512 cpu:2 vector:00,00000004 
domain:00,00000004 success
[   31.729154] default_cpu_mask_to_apicid: irq:512 hwirq:512 
mask:00,00000004
[   31.729154] apic_set_affinity: irq:512 mask:ff,ffffffff err:0
...
[   32.818152] mlx5_irq_set_affinity_hint: 0: irq:512 mask:00,00000001
...
[   39.531242] __assign_irq_vector: irq:512 cpu:0 mask:00,00000001 
onlinemask:ff,ffffffff vector:222
[   39.531244] __assign_irq_vector: irq:512 cpu:0 mask:00,00000001 
vector_cpumask:00,00000001 vector:222
[   39.531245] __assign_irq_vector: irq:512 cpu:0 vector:00,00000001 
domain:00,00000004
...
[   39.531384] __assign_irq_vector: irq:512 cpu:0 vector:37 
current_vector:37 next_cpu2
[   39.531385] __assign_irq_vector: irq:512 cpu:128 searched:00,00000001 
vector:00,00000000 continue
[   39.531386] apic_set_affinity: irq:512 mask:00,00000001 err:-28

The affinity values:

root@....25.48.208:/proc/irq/512# grep . *
affinity_hint:00,00000001
effective_affinity:00,00000004
effective_affinity_list:2
grep: mlx5_comp0@pci:0000:65:00.1: Is a directory
node:0
smp_affinity:ff,ffffffff
smp_affinity_list:0-39
spurious:count 3
spurious:unhandled 0
spurious:last_unhandled 0 ms

I noticed your change, a0c9259dc4e1 "irq/matrix: Spread interrupts on 
allocation", and this sounds like what we're hitting. Booting 4.19 does 
not have this problem. I haven't booted 4.15 yet, but can do it to 
confirm the above commit is what resolves this.

Since 4.14 doesn't have the matrix allocator it's not a trivial 
backport. I was wondering a) if you agree with my assessment and b) if 
there's any plans on resolving this on the 4.14 allocator? If not I can 
attempt to backport the idea to 4.14 to spread the interrupts around on 
allocation.

Thanks
Josh







Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ