lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110529095134.GB9489@e102109-lin.cambridge.arm.com>
Date:	Sun, 29 May 2011 10:51:34 +0100
From:	Catalin Marinas <catalin.marinas@....com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Russell King - ARM Linux <linux@....linux.org.uk>,
	Peter Zijlstra <peterz@...radead.org>,
	Marc Zyngier <Marc.Zyngier@....com>,
	Frank Rowand <frank.rowand@...sony.com>,
	Oleg Nesterov <oleg@...hat.com>, linux-kernel@...r.kernel.org,
	Yong Zhang <yong.zhang0@...il.com>,
	linux-arm-kernel@...ts.infradead.org
Subject: Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()"
 locks up on ARM

On Fri, May 27, 2011 at 01:06:29PM +0100, Ingo Molnar wrote:
> * Catalin Marinas <catalin.marinas@....com> wrote:
> > > How much time does that take on contemporary ARM hardware,
> > > typically (and worst-case)?
> >
> > On newer ARMv6 and ARMv7 hardware, we no longer flush the caches at
> > context switch as we got VIPT (or PIPT-like) caches.
> >
> > But modern ARM processors use something called ASID to tag the TLB
> > entries and we are limited to 256. The switch_mm() code checks for
> > whether we ran out of them to restart the counting. This ASID
> > roll-over event needs to be broadcast to the other CPUs and issuing
> > IPIs with the IRQs disabled isn't always safe. Of course, we could
> > briefly re-enable them at the ASID roll-over time but I'm not sure
> > what the expectations of the code calling switch_mm() are.
> 
> The expectations are to have irqs off (we are holding the runqueue
> lock if !__ARCH_WANT_INTERRUPTS_ON_CTXSW), so that's not workable i
> suspect.
> 
> But in theory we could drop the rq lock and restart the scheduler
> task-pick and balancing sequence when the ARM TLB tag rolls over. So
> instead of this fragile and assymetric method we'd have a
> straightforward retry-in-rare-cases method.

During switch_mm(), we check whether the task being scheduled in has an
old ASID and acquire a lock for a global ASID variable. If two CPUs do
the context switching at the same time, one of them would get stuck on
cpu_asid_lock. If on the other CPU we get an ASID roll-over, we have to
broadcast it to the other CPUs via IPI. But one of the other CPUs is
stuck on cpu_asid_lock with interrupts disabled and we get a deadlock.

An option could be to drop cpu_asid_lock and use some atomic operations
for the global ASID tracking variable but it needs some thinking. The
ASID tag requirements are that it should be unique across all the CPUs
in the system and two threads sharing the same mm must have the same
ASID (hence the IPI to the other CPUs).

Maybe Russell's idea to move the page table setting outside in some post
task-switch hook would be easier to implement.

-- 
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ