[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140402232948.GA4725@psi-dev26.jf.intel.com>
Date: Wed, 2 Apr 2014 16:29:48 -0700
From: David Cohen <david.a.cohen@...ux.intel.com>
To: linux-kernel@...r.kernel.org
Cc: peterz@...radead.org, mingo@...nel.org,
kpreempt-tech@...ts.sourceforge.net
Subject: Soft lockup regression since kernel 3.13
Hi,
I've detected a regression from upstream (using an Intel Merrifield
device) since 3.13 (still exists in 3.14) which I never had much time
to start to investigate until now. The symptoms are: the device boots
and works fine for while until it silently hangs.
I finally bisected v3.12..v3.13 and found exactly which commit created
the issue:
commit f27dde8deef33c9e58027df11ceab2198601d6a6
Author: Peter Zijlstra <peterz@...radead.org>
Date: Wed Aug 14 14:55:31 2013 +0200
sched: Add NEED_RESCHED to the preempt_count
In order to combine the preemption and need_resched test we need to
fold the need_resched information into the preempt_count value.
Since the NEED_RESCHED flag is set across CPUs this needs to be an
atomic operation, however we very much want to avoid making
preempt_count atomic, therefore we keep the existing TIF_NEED_RESCHED
infrastructure in place but at 3 sites test it and fold its value into
preempt_count; namely:
- resched_task() when setting TIF_NEED_RESCHED on the current task
- scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
remote task it follows it up with a reschedule IPI
and we can modify the cpu local preempt_count from
there.
- cpu_idle_loop() for when resched_task() found tsk_is_polling().
We use an inverted bitmask to indicate need_resched so that a 0 means
both need_resched and !atomic.
Also remove the barrier() in preempt_enable() between
preempt_enable_no_resched() and preempt_check_resched() to avoid
having to reload the preemption value and allow the compiler to use
the flags of the previuos decrement. I couldn't come up with any sane
reason for this barrier() to be there as preempt_enable_no_resched()
already has a barrier() before doing the decrement.
Suggested-by: Ingo Molnar <mingo@...nel.org>
Signed-off-by: Peter Zijlstra <peterz@...radead.org>
Link: http://lkml.kernel.org/n/tip-7a7m5qqbn5pmwnd4wko9u6da@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@...nel.org>
--
I dumped directly from memory the content of __log_buf when device crashes
(refer to attached text file), since serial console is unable to show anything.
I also found an external references to something similar:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269404
Is this a known issue being worked currently?
Br, David Cohen
View attachment "syslog.txt" of type "text/plain" (3372 bytes)
Powered by blists - more mailing lists