linux-kernel - [PATCH v2 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1480601214-26583-1-git-send-email-nhaehnle@gmail.com>
Date:   Thu,  1 Dec 2016 15:06:43 +0100
From:   Nicolai Hähnle <nhaehnle@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: [PATCH v2 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

Changes to patches 1 & 5 based on feedback. I've also updated the branch
at https://cgit.freedesktop.org/~nh/linux/log/?h=mutex.

There's been the question of using a balanced tree rather than a list.
Frankly, I'd say the 99% use case doesn't need it. Also, dealing with
waiters without a context is easy in the list, but becomes trickier with a
tree. I think one can do it with an additional counter on the lock itself
to establish the FIFO order of context-less waiters, but it'd make the
code harder to follow. I doubt that it's worth it.

(original cover letter below)
The basic idea is to make sure that:

1. All waiters that have a ww_ctx appear in stamp order in the wait list.
Waiters without a ww_ctx are still supported and appear in FIFO order as
before.

2. At most one of the waiters can be in a state where it has to check for
back off (i.e., ww_ctx->acquire > 0). Technically, there are short time
windows in which more than one such waiter can be on the list, but all
but the first one are running. This happens when a new waiter with
ww_ctx->acquire > 0 adds itself at the front of the list and wakes up the
previous head of the list, and of course multiple such chained cases can
be in-flight simultaneously.

Then we only ever have to wake up one task at a time. This is _not_ always
the head of the wait list, since there may be waiters without a context. But
among waiters with a context, we only ever have to wake the first one.

To achieve all this, the series adds a new field to mutex_waiter which is
only used for the w/w lock case. As a consequence, calling mutex_lock
directly on w/w locks is now definitely incorrect. That was likely the
intention previously anyway, but grepping through the source I did find one
place that had slipped through.

I've included timings taken from a contention-heavy stress test to some of
the patches. The stress test performs actual GPU operations which take a
good chunk of the wall time, but even so, the series still manages to
improve the wall time quite a bit.

Cheers,
Nicolai