[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20150518124833.GA7512@peter-bsd.cuba.int>
Date: Mon, 18 May 2015 15:48:33 +0300
From: <p_kosyh@...tor-ts.ru>
To: netdev@...r.kernel.org
Subject: __assign_irq_vector (x86) and irq vectors exhaust
Hello all!
Playing with 10Gb network adapters and /proc/irq/<nr>/smp_affinity
we found that sometimes we can not move interrupt on selected cpu.
After digging source code we found, that
arch/x86/kernel/apic/vector.c: __assign_irq_vector (4.0 kernel)
allocates vectors in not optimal way.
For example, we have a 32 cpu system with lot of 10Gb cards (each of
them has 32 msi-x irqs). Even if card is not used, it allocates an irq
vector after probing (pci_enable_msix()). We have about ~200 vectors limit
per cpu (on x86), and __assign_irq_vector allocates them filling cpus one
by one (see at cpumask_first_and()):
...
cpumask_clear(cfg->old_domain);
cpu = cpumask_first_and(mask, cpu_online_mask);
/* here we are got 1st non zero bit <----------- */
while (cpu < nr_cpu_ids) {
int new_cpu, vector, offset;
apic->vector_allocation_domain(cpu, tmp_mask, mask);
...
if (unlikely(current_vector == vector)) {
cpumask_or(cfg->old_domain, cfg->old_domain,
tmp_mask);
cpumask_andnot(tmp_mask, mask, cfg->old_domain);
cpu = cpumask_first_and(tmp_mask,
cpu_online_mask);
/* get next non zero bit <------------ */
continue;
}
...
So, after our system is up, we have a situation when some cpus
has no free vectors at all!! And some cpus has all vectors free.
Userspace do not know nothing about this exhaust!!! So after writing
mask to the smp_affinity we can got a situation that irq can not be moved.
Silently.
It is not a critical thing when you are doing all stuff by
hands, but if we are using irq balancer, like birq (http://birq.libcode.org)
or any other, this problem becomes critical one! Balancer has not idea, why irq
is still not moved!!! Btw, the other problem is napi and softirq sticking
(http://comments.gmane.org/gmane.linux.network/322914). But i
already wrote about this problem and possible solution.
Anyway, it's like a bad idea to allocate cpu one after one and not to sparse
irq vectors.
The solution is simple. Instead of using cpumask_first_and(), try to get
RANDOM bit. I wrote dirty realization that works for me. Of
course, it must be done in right way, but i have attached patch for
illustration.
Hope, it help someone else....
Thank you!
--
Peter Kosyh
View attachment "linux-3.10-irq-rr.patch" of type "text/x-diff" (1915 bytes)
Powered by blists - more mailing lists