[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150403065557.GA12815@gmail.com>
Date: Fri, 3 Apr 2015 08:55:57 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Jesse Brandeburg <jesse.brandeburg@...el.com>
Cc: torvalds@...ux-foundation.org,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, John <jw@...learfallout.net>
Subject: Re: [PATCH] irq: revert non-working patch to affinity defaults
* Jesse Brandeburg <jesse.brandeburg@...el.com> wrote:
> I've seen a couple of reports of issues since commit e2e64a932556 ("genirq:
> Set initial affinity in irq_set_affinity_hint()") where the
> affinity for the interrupt when programmed via
> /proc/irq/<nnn>/smp_affinity will not be able to stick. It changes back
> to some previous value at the next interrupt on that IRQ.
>
> The original intent was to fix the broken default behavior of all IRQs
> for a device starting up on CPU0. With a network card with 64 or more
> queues, all 64 queue's interrupt vectors end up on CPU0 which can have
> bad side effects, and has to be fixed by the irqbalance daemon, or by
> the user at every boot with some kind of affinity script.
>
> The symptom is that after a driver calls set_irq_affinity_hint, the
> affinity will be set for that interrupt (and readable via /proc/...),
> but on the first irq for that vector, the affinity for CPU0 or CPU1
> resets to the default. The rest of the irq affinites seem to work and
> everything is fine.
>
> Impact if we don't fix this for 4.0.0:
> Some users won't be able to set irq affinity as expected, on
> some cpus.
>
> I've spent a chunk of time trying to debug this with no luck and suggest
> that we revert the change if no-one else can help me debug what is going
> wrong, we can pick up the change later.
>
> This commit would also revert commit 4fe7ffb7e17ca ("genirq: Fix null pointer
> reference in irq_set_affinity_hint()") which was a bug fix to the original
> patch.
So the original commit also has the problem that it unnecessary
drops/retakes the descriptor lock:
> irq_put_desc_unlock(desc, flags);
> - /* set the initial affinity to prevent every interrupt being on CPU0 */
> - if (m)
> - __irq_set_affinity(irq, m, false);
i.e. why not just call into irq_set_affinity_locked() while we still
have the descriptor locked?
Now this is just a small annoyance that should not really matter - it
would be nice to figure out the real reason for why the irqs move back
to CPU#0.
In theory the same could happen to 'irqbalanced' as well, if it calls
shortly after an irq was registered - so this is not a bug we want to
ignore.
Also, worst case we are back to where v3.19 was, right? So could we
try to analyze this a bit more?
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists