[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080819091918.21725.39839.stgit@dev.haskins.net>
Date: Tue, 19 Aug 2008 05:19:18 -0400
From: Gregory Haskins <ghaskins@...ell.com>
To: mingo@...e.hu, rostedt@...dmis.org, tglx@...utronix.de
Cc: ghaskins@...ell.com, linux-kernel@...r.kernel.org,
linux-rt-users@...r.kernel.org
Subject: [PATCH RT 1/2] seqlock: make sure that raw_seqlock_t retries readers
while writes are pending
The seqlock protocol is broken in -rt for raw_seqlock_t objects. This
manifested in my 2.6.26-rt1 kernel as a 500ms (yes, millisecond) spike
which was traced out with ftrace/preemptirqsoff to be originating in
the HRT (hrtimer_interrupt, to be precise). It would occasionally
spin processing the same CLOCK_MONOTONIC timer (the scheduler-tick)
in a tight loop with interrupts disabled. Investigating, it turned out
that the time-basis recorded for "now" early in the interrupt was
momentarily moved 500ms in the future. This caused all timers with
correct expiration times to appear to have expired a long time ago.
Even rescheduling the timer via hrtimer_forward ultimately placed the
timer in an "expired" state since the "now" basis was in the future.
So I began investigating how this time-basis (derived from ktime_get())
could have done this. I observed that ktime_get() readers were able to
successfully read a time value even while another core held a
write-lock on the xtime_lock. Therefore the fundamental issue was
that ktime_get was able to return transitional states of the
xtime/clocksource infrastructure, which is clearly not intended.
I root caused the issue to the raw_seqlock_t implementation. It was
missing support for retrying a reader if it finds a write-pending
flag. Investigating further, I think I can speculate why.
Back in April, Ingo and Thomas checked in a fix to mainline for seqlocks,
referenced here:
commit 88a411c07b6fedcfc97b8dc51ae18540bd2beda0
Author: Ingo Molnar <mingo@...e.hu>
Date: Thu Apr 3 09:06:13 2008 +0200
seqlock: livelock fix
Thomas Gleixner debugged a particularly ugly seqlock related livelock:
do not process the seq-read section if we know it beforehand that the
test at the end of the section will fail ...
Signed-off-by: Ingo Molnar <mingo@...e.hu>
Of course, mainline only has seqlock_t. In -rt, we have both seqlock_t
and raw_seqlock_t. It would appear that the merge-resolution for
commit 88a411c07b6 to the -rt branch inadvertently applied one hunk
of the fix to seqlock_t, and the other to raw_seqlock_t. The normal
seqlocks now have two checks for retry, while the raw_seqlocks have none.
This lack of a check is what causes the protocol failure, which ultimately
caused the bad clock info and a latency spike.
This patch corrects the above condition by applying the conceptual change
from 88a411c07b6 to both seqlock_t and raw_seqlock_t equally. The observed
problems with the HRT spike are confirmed to no longer be reproducible as
as result.
Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
CC: Ingo Molnar <mingo@...e.hu>
CC: Thomas Gleixner <tglx@...utronix.de>
---
include/linux/seqlock.h | 12 ++++++++++--
1 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e6ecb46..345d726 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -145,7 +145,7 @@ static inline int __read_seqretry(seqlock_t *sl, unsigned iv)
int ret;
smp_rmb();
- ret = (iv & 1) | (sl->sequence ^ iv);
+ ret = (sl->sequence != iv);
/*
* If invalid then serialize with the writer, to make sure we
* are not livelocking it:
@@ -228,8 +228,16 @@ static __always_inline int __write_tryseqlock_raw(raw_seqlock_t *sl)
static __always_inline unsigned __read_seqbegin_raw(const raw_seqlock_t *sl)
{
- unsigned ret = sl->sequence;
+ unsigned ret;
+
+repeat:
+ ret = sl->sequence;
smp_rmb();
+ if (unlikely(ret & 1)) {
+ cpu_relax();
+ goto repeat;
+ }
+
return ret;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists