linux-kernel - [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers while writes are pending

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080819091918.21725.39839.stgit@dev.haskins.net>
Date:	Tue, 19 Aug 2008 05:19:18 -0400
From:	Gregory Haskins <ghaskins@...ell.com>
To:	mingo@...e.hu, rostedt@...dmis.org, tglx@...utronix.de
Cc:	ghaskins@...ell.com, linux-kernel@...r.kernel.org,
	linux-rt-users@...r.kernel.org
Subject: [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers
	while writes are pending

The seqlock protocol is broken in -rt for raw_seqlock_t objects.  This
manifested in my 2.6.26-rt1 kernel as a 500ms (yes, millisecond) spike
which was traced out with ftrace/preemptirqsoff to be originating in
the HRT (hrtimer_interrupt, to be precise).  It would occasionally
spin processing the same CLOCK_MONOTONIC timer (the scheduler-tick)
in a tight loop with interrupts disabled.  Investigating, it turned out
that the time-basis recorded for "now" early in the interrupt was
momentarily moved 500ms in the future.  This caused all timers with
correct expiration times to appear to have expired a long time ago.
Even rescheduling the timer via hrtimer_forward ultimately placed the
timer in an "expired" state since the "now" basis was in the future.

So I began investigating how this time-basis (derived from ktime_get())
could have done this.  I observed that ktime_get() readers were able to
successfully read a time value even while another core held a
write-lock on the xtime_lock.  Therefore the fundamental issue was
that ktime_get was able to return transitional states of the
xtime/clocksource infrastructure, which is clearly not intended.

I root caused the issue to the raw_seqlock_t implementation.  It was
missing support for retrying a reader if it finds a write-pending
flag.  Investigating further, I think I can speculate why.

Back in April, Ingo and Thomas checked in a fix to mainline for seqlocks,
referenced here:

	commit 88a411c07b6fedcfc97b8dc51ae18540bd2beda0
	Author: Ingo Molnar <mingo@...e.hu>
	Date:   Thu Apr 3 09:06:13 2008 +0200

	seqlock: livelock fix

	Thomas Gleixner debugged a particularly ugly seqlock related livelock:
	do not process the seq-read section if we know it beforehand that the
	test at the end of the section will fail ...

	Signed-off-by: Ingo Molnar <mingo@...e.hu>

Of course, mainline only has seqlock_t.  In -rt, we have both seqlock_t
and raw_seqlock_t.  It would appear that the merge-resolution for
commit 88a411c07b6 to the -rt branch inadvertently applied one hunk
of the fix to seqlock_t, and the other to raw_seqlock_t.  The normal
seqlocks now have two checks for retry, while the raw_seqlocks have none.
This lack of a check is what causes the protocol failure, which ultimately
caused the bad clock info and a latency spike.

This patch corrects the above condition by applying the conceptual change
from 88a411c07b6 to both seqlock_t and raw_seqlock_t equally.  The observed
problems with the HRT spike are confirmed to no longer be reproducible as
as result.

Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
CC: Ingo Molnar <mingo@...e.hu>
CC: Thomas Gleixner <tglx@...utronix.de>
---

 include/linux/seqlock.h |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e6ecb46..345d726 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -145,7 +145,7 @@ static inline int __read_seqretry(seqlock_t *sl, unsigned iv)
 	int ret;

 	smp_rmb();
-	ret = (iv & 1) | (sl->sequence ^ iv);
+	ret = (sl->sequence != iv);
 	/*
 	 * If invalid then serialize with the writer, to make sure we
 	 * are not livelocking it:
@@ -228,8 +228,16 @@ static __always_inline int __write_tryseqlock_raw(raw_seqlock_t *sl)

 static __always_inline unsigned __read_seqbegin_raw(const raw_seqlock_t *sl)
 {
-	unsigned ret = sl->sequence;
+	unsigned ret;
+
+repeat:
+	ret = sl->sequence;
 	smp_rmb();
+	if (unlikely(ret & 1)) {
+		cpu_relax();
+		goto repeat;
+	}
+
 	return ret;
 }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/