lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180901022125.GO4941@tuon.disenchant.local>
Date:   Sat, 1 Sep 2018 11:51:26 +0930
From:   Kevin Shanahan <kevin@...nahan.id.au>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Siegfried Metz <frame@...lbox.org>, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, rafael.j.wysocki@...el.com,
        len.brown@...el.com, rjw@...ysocki.net, diego.viola@...il.com,
        rui.zhang@...el.com, viktor_jaegerskuepper@...enet.de
Subject: Re: REGRESSION: boot stalls on several old dual core Intel CPUs

On Thu, Aug 30, 2018 at 03:04:39PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 30, 2018 at 12:55:30PM +0200, Siegfried Metz wrote:
> > Dear kernel developers,
> > 
> > since mainline kernel 4.18 (up to the latest mainline kernel 4.18.5)
> > Intel Core 2 Duo processors are affected by boot stalling early in the
> > boot process. As it is so early there is no dmesg output (or any log).
> > 
> > A few users in the Arch Linux community used git bisect and tracked the
> > issue down to this the bad commit:
> > 7197e77abcb65a71d0b21d67beb24f153a96055e clocksource: Remove kthread
> 
> I just dug out my core2duo laptop (Lenovo T500) and build a tip/master
> kernel for it (x86_64 debian distro .config).
> 
> Seems to boot just fine.. 3/3 so far.
> 
> Any other clues?

One additional data point, my affected system is a Dell Latitude E6400
laptop which has a P8400 CPU:

  vendor_id     : GenuineIntel
  cpu family    : 6
  model         : 23
  model name    : Intel(R) Core(TM)2 Duo CPU     P8400  @ 2.26GHz
  stepping      : 6
  microcode     : 0x610

Judging from what is being discussed in the Arch forums, it does seem
to related to the CPU having unstable TSC and transitioning to another
clock source.  Workarounds that seem to be reliable are either booting
with clocksource=<something_not_tsc> or with nosmp.

One person did point out that the commit that introduced the kthread
did so to remove a deadlock - is the circular locking dependency
mentioned in that commit still relevant?

commit 01548f4d3e8e94caf323a4f664eb347fd34a34ab
Author: Martin Schwidefsky <schwidefsky@...ibm.com>
Date:   Tue Aug 18 17:09:42 2009 +0200

    clocksource: Avoid clocksource watchdog circular locking dependency

    stop_machine from a multithreaded workqueue is not allowed because
    of a circular locking dependency between cpu_down and the workqueue
    execution. Use a kernel thread to do the clocksource downgrade.

    Signed-off-by: Martin Schwidefsky <schwidefsky@...ibm.com>
    Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
    Cc: john stultz <johnstul@...ibm.com>
    LKML-Reference: <20090818170942.3ab80c91@...base>
    Signed-off-by: Thomas Gleixner <tglx@...utronix.de>

Thanks,
Kevin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ