lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Mon, 18 May 2015 15:48:33 +0300
From:	<p_kosyh@...tor-ts.ru>
To:	netdev@...r.kernel.org
Subject: __assign_irq_vector (x86) and irq vectors exhaust

Hello all!

Playing with 10Gb network adapters and /proc/irq/<nr>/smp_affinity 
we found that sometimes we can not move interrupt on selected cpu.

After digging source code we found, that
arch/x86/kernel/apic/vector.c: __assign_irq_vector (4.0 kernel)
allocates vectors in not optimal way.

For example, we have a 32 cpu system with lot of 10Gb cards (each of
them has 32 msi-x irqs). Even if card is not used, it allocates an irq
vector after probing (pci_enable_msix()). We have about ~200 vectors limit 
per cpu (on x86), and __assign_irq_vector allocates them filling cpus one 
by one (see at cpumask_first_and()):

	...

	cpumask_clear(cfg->old_domain);
	cpu = cpumask_first_and(mask, cpu_online_mask);
	/* here we are got 1st non zero bit <----------- */
	while (cpu < nr_cpu_ids) {
		int new_cpu, vector, offset;

		apic->vector_allocation_domain(cpu, tmp_mask, mask);

		...

		if (unlikely(current_vector == vector)) {
		cpumask_or(cfg->old_domain, cfg->old_domain,
tmp_mask);
			cpumask_andnot(tmp_mask, mask, cfg->old_domain);
			cpu = cpumask_first_and(tmp_mask,
cpu_online_mask);
			/* get next non zero bit <------------ */
			continue;
		}

		...

So, after our system is up, we have a situation when some cpus
has no free vectors at all!! And some cpus has all vectors free.

Userspace do not know nothing about this exhaust!!! So after writing 
mask to the smp_affinity we can got a situation that irq can not be moved.
Silently.

It is not a critical thing when you are doing all stuff by
hands, but if we are using irq balancer, like birq (http://birq.libcode.org) 
or any other, this problem becomes critical one! Balancer has not idea, why irq
is still not moved!!! Btw, the other problem is napi and softirq sticking
(http://comments.gmane.org/gmane.linux.network/322914). But i
already wrote about this problem and possible solution.

Anyway, it's like a bad idea to allocate cpu one after one and not to sparse 
irq vectors.

The solution is simple. Instead of using cpumask_first_and(), try to get
RANDOM bit. I wrote dirty realization that works for me. Of
course, it must be done in right way, but i have attached patch for
illustration. 

Hope, it help someone else....

Thank you!



-- 
Peter Kosyh

View attachment "linux-3.10-irq-rr.patch" of type "text/x-diff" (1915 bytes)

Powered by blists - more mailing lists