linux-kernel - Re: sched: softlockups in multi_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFzuaUyAtbCAHrUZ1Prew1Dn4DJquH1LtCX_6A5fUK4Mqw@mail.gmail.com>
Date:	Fri, 6 Mar 2015 11:32:56 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Davidlohr Bueso <dave@...olabs.net>
Cc:	Jason Low <jason.low2@...com>, Ingo Molnar <mingo@...nel.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...emonkey.org.uk>
Subject: Re: sched: softlockups in multi_cpu_stop

On Fri, Mar 6, 2015 at 11:20 AM, Davidlohr Bueso <dave@...olabs.net> wrote:
>
> I obviously agree with all those points, however fyi most of the testing
> on rwsems I do includes scaling address space ops stressing the
> mmap_sem, which is a real world concern. So while it does include
> microbenchmarks, it is not guided by them.

So I agree that mmap_sem is problematic.

We probably still end up holding it over many actual IO operations,
for example. The whole "FAULT_RETRY" thing should have helped a lot,
in that hopefully at least a fair amount of the time we now end up
waiting for the IO without holding the semaphore, but I bet many other
cases remain.

And I also suspect that we could try to be even more aggressive, and
allow some entirely unlocked cases. For example, long long ago we used
to have a completely SMP-unsafe model where we would do things
optimistically - doing IO without holding any locks, and then before
we "committed" to it, we'd re-try.  And I wonder if we might want to
re-introduce that for the cases where we hit in caches and could use
RCU.

IOW, I wonder if we could special-case the common non-IO
fault-handling path something along the lines of:

 - look up the vma in the vma lookup cache
 - look up the page in the page cache
 - get the page table spinlock
 - re-check the vma now (it ends up being stable if it can't be torn
down due to the page table spinlock)

because I suspect that page faults are the biggest users of that
mmap_sem, and we could probably handle a fairly large common case
(making it simpler by special-casing it and punting in any even
_slightly_ complicated situations) without even getting the semaphore
at all, since we have to serialize on the actual page table *anyway*.

Basically, to me, the whole "if a lock is so contended that we need to
play locking games, then we should look at why we *use* the lock,
rather than at the lock itself" is a religion.

                         Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/