linux-kernel - Re: [PATCH] irq: revert non-working patch to affinity defaults

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150403065557.GA12815@gmail.com>
Date:	Fri, 3 Apr 2015 08:55:57 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Jesse Brandeburg <jesse.brandeburg@...el.com>
Cc:	torvalds@...ux-foundation.org,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org, John <jw@...learfallout.net>
Subject: Re: [PATCH] irq: revert non-working patch to affinity defaults


* Jesse Brandeburg <jesse.brandeburg@...el.com> wrote:

> I've seen a couple of reports of issues since commit e2e64a932556 ("genirq:
> Set initial affinity in irq_set_affinity_hint()") where the
> affinity for the interrupt when programmed via
> /proc/irq/<nnn>/smp_affinity will not be able to stick.  It changes back
> to some previous value at the next interrupt on that IRQ.
> 
> The original intent was to fix the broken default behavior of all IRQs
> for a device starting up on CPU0.  With a network card with 64 or more
> queues, all 64 queue's interrupt vectors end up on CPU0 which can have
> bad side effects, and has to be fixed by the irqbalance daemon, or by
> the user at every boot with some kind of affinity script.
> 
> The symptom is that after a driver calls set_irq_affinity_hint, the
> affinity will be set for that interrupt (and readable via /proc/...),
> but on the first irq for that vector, the affinity for CPU0 or CPU1
> resets to the default.  The rest of the irq affinites seem to work and
> everything is fine.
> 
> Impact if we don't fix this for 4.0.0:
> 	Some users won't be able to set irq affinity as expected, on
> 	some cpus.
> 
> I've spent a chunk of time trying to debug this with no luck and suggest
> that we revert the change if no-one else can help me debug what is going
> wrong, we can pick up the change later.
> 
> This commit would also revert commit 4fe7ffb7e17ca ("genirq: Fix null pointer
> reference in irq_set_affinity_hint()") which was a bug fix to the original
> patch.

So the original commit also has the problem that it unnecessary 
drops/retakes the descriptor lock:

>  	irq_put_desc_unlock(desc, flags);
> -	/* set the initial affinity to prevent every interrupt being on CPU0 */
> -	if (m)
> -		__irq_set_affinity(irq, m, false);


i.e. why not just call into irq_set_affinity_locked() while we still 
have the descriptor locked?

Now this is just a small annoyance that should not really matter - it 
would be nice to figure out the real reason for why the irqs move back 
to CPU#0.

In theory the same could happen to 'irqbalanced' as well, if it calls 
shortly after an irq was registered - so this is not a bug we want to 
ignore.

Also, worst case we are back to where v3.19 was, right? So could we 
try to analyze this a bit more?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/