linux-kernel - RFC on fixing mutex spinning on owner

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=GYpbHjcsuQSnNPL4SCv-r=Me6oiH7dJ98a64udbakWLaUjQ@mail.gmail.com>
Date:	Wed, 16 Mar 2016 16:22:17 -0700
From:	Joel Fernandes <agnel.joel@...il.com>
To:	linux-rt-users@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	kernelnewbies <kernelnewbies@...linux.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...hat.com>,
	Greg Kroah-Hartman <greg@...ah.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: RFC on fixing mutex spinning on owner

Hi,

On a fairly recent kernel and android userspace, I am seeing that with
i915 driver is in a spin loop waiting for mutex owner to release it
(mutex_spin_on_owner). I believe this because the owner of the mutex
is running on another CPU and the expectation is the mutex owner
releases the mutex or goes to sleep soon, so we avoid sleeping if we
fail to acquire mutex and continue to spin and try to acquire it much
like a spinlock (while disabling preemption through out the spinning).

My question is, what if the owner cannot or doesn't want to sleep and
holds the mutex runs for a while while holding it. (Lets also assume
that all other tasks are sleeping on the mutex owner's CPU so its not
preempted).

In this case, does it make sense to time out the spinning after a
while? Because preemption is disabled during the spinning so this
spinning business seems a very very bad thing.

Should the code holding the mutex and running (the owner) be fixed to
not hold mutex for a while? Or would a patch introducing a timeout of
a certain threshold on the spinning be welcomed?

To give numbers, I am seeing spinning of as long as 20 ms in the worst
case, while the mutex owner holds the mutex for 22 ms. The ftrace
preemptoff tracer goes off.

Thanks for any advice on what the right fix of the problem should be.

Best,
Joel