[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091106075820.GA28227@elte.hu>
Date: Fri, 6 Nov 2009 08:58:20 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Tejun Heo <tj@...nel.org>, Nick Piggin <npiggin@...e.de>
Cc: Jiri Kosina <jkosina@...e.cz>,
Peter Zijlstra <peterz@...radead.org>,
Yinghai Lu <yhlu.kernel@...il.com>,
Thomas Gleixner <tglx@...utronix.de>, cl@...ux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: irq lock inversion
* Tejun Heo <tj@...nel.org> wrote:
> Ingo Molnar wrote:
> >>> This warning is bogus -- sched_init() is being called very early with IRQs
> >>> disabled, and the irqsave/restore code paths in pcpu_alloc() are only for early
> >>> init. The path can never be called from irq context once the early init
> >>> finishes. Rationale for this is explained in changelog of the commit mentioned
> >>> above.
> >>>
> >>> This problem can be encountered generally in any other early code running
> >>> with IRQs off and using irqsave/irqrestore.
> >>>
> >>> Reported-by: Yinghai Lu <yhlu.kernel@...il.com>
> >>> Signed-off-by: Jiri Kosina <jkosina@...e.cz>
> >> Looks good to me. Ingo, what do you think?
> >
> > Ugh, this explanation is _BOGUS_. As i said, taking a lock with irqs
> > disabled does _NOT_ mark a lock as 'irq safe' - if it did, we'd have
> > false positives left and right.
> >
> > Read the lockdep message please, consider all the backtraces it prints,
> > it says something different.
>
> Ah... okay, the pcpu_free() path is correctly marking the lock
> irqsafe. I assumed this was caused by recent pcpu_alloc() change.
> Sorry about that. The lock inversion problem has always been there,
> it just never showed up because none has use allocation map that large
> I suppose.
>
> So, the correct fix would be either 1. push down irqsafeness down to
> vmalloc locks or 2. the rather ugly unlock-lock dancing in
> pcpu_extend_area_map() I posted earlier. For 2.6.32, I guess we'll
> have to go with #2. For longer term, we'll probably have to do #1 as
> it's required to implement atomic percpu allocations too.
>
> I'll try to reproduce the problem here and verify the previous locking
> dance patch.
I havent looked deeply but at first sight i'm not 100% sure that even
the lock dance hack is safe - doesnt vfree() do TLB flushes, which must
be done with irqs enabled in general? If yes, then the whole notion of
using the allocator from irqs-off sections is wrong and the flags
save/restore is misguided (or at least incomplete).
So the real problem right now i think is the use of the pcpu allocator
from within a BH section (and from irqs-off sections) - that usage
should be eliminated from .32, or the allocator should be fixed. (which
looks non-trivial vmalloc/vfree was never really intended to be used in
irq-atomic contexts)
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists