lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140402232948.GA4725@psi-dev26.jf.intel.com>
Date:	Wed, 2 Apr 2014 16:29:48 -0700
From:	David Cohen <david.a.cohen@...ux.intel.com>
To:	linux-kernel@...r.kernel.org
Cc:	peterz@...radead.org, mingo@...nel.org,
	kpreempt-tech@...ts.sourceforge.net
Subject: Soft lockup regression since kernel 3.13

Hi,

I've detected a regression from upstream (using an Intel Merrifield
device) since 3.13 (still exists in 3.14) which I never had much time
to start to investigate until now. The symptoms are: the device boots
and works fine for while until it silently hangs.

I finally bisected v3.12..v3.13 and found exactly which commit created
the issue:

commit f27dde8deef33c9e58027df11ceab2198601d6a6
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Wed Aug 14 14:55:31 2013 +0200

    sched: Add NEED_RESCHED to the preempt_count

    In order to combine the preemption and need_resched test we need to
    fold the need_resched information into the preempt_count value.

    Since the NEED_RESCHED flag is set across CPUs this needs to be an
    atomic operation, however we very much want to avoid making
    preempt_count atomic, therefore we keep the existing TIF_NEED_RESCHED
    infrastructure in place but at 3 sites test it and fold its value into
    preempt_count; namely:

     - resched_task() when setting TIF_NEED_RESCHED on the current task
     - scheduler_ipi() when resched_task() sets TIF_NEED_RESCHED on a
                       remote task it follows it up with a reschedule IPI
                       and we can modify the cpu local preempt_count from
                       there.
     - cpu_idle_loop() for when resched_task() found tsk_is_polling().

    We use an inverted bitmask to indicate need_resched so that a 0 means
    both need_resched and !atomic.

    Also remove the barrier() in preempt_enable() between
    preempt_enable_no_resched() and preempt_check_resched() to avoid
    having to reload the preemption value and allow the compiler to use
    the flags of the previuos decrement. I couldn't come up with any sane
    reason for this barrier() to be there as preempt_enable_no_resched()
    already has a barrier() before doing the decrement.

    Suggested-by: Ingo Molnar <mingo@...nel.org>
    Signed-off-by: Peter Zijlstra <peterz@...radead.org>
    Link: http://lkml.kernel.org/n/tip-7a7m5qqbn5pmwnd4wko9u6da@git.kernel.org
    Signed-off-by: Ingo Molnar <mingo@...nel.org>
--

I dumped directly from memory the content of __log_buf when device crashes
(refer to attached text file), since serial console is unable to show anything.

I also found an external references to something similar:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1269404

Is this a known issue being worked currently?

Br, David Cohen

View attachment "syslog.txt" of type "text/plain" (3372 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ