lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1712281531120.1899@nanos>
Date:   Thu, 28 Dec 2017 15:48:15 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Alexandru Chirvasitu <achirvasub@...il.com>
cc:     Dou Liyang <douly.fnst@...fujitsu.com>,
        Pavel Machek <pavel@....cz>,
        kernel list <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        "Maciej W. Rozycki" <macro@...ux-mips.org>,
        Mikael Pettersson <mikpelinux@...il.com>,
        Josh Poulson <jopoulso@...rosoft.com>,
        Mihai Costache <v-micos@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Marc Zyngier <marc.zyngier@....com>, linux-pci@...r.kernel.org,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Dexuan Cui <decui@...rosoft.com>,
        Simon Xiao <sixiao@...rosoft.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Jork Loeser <Jork.Loeser@...rosoft.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        devel@...uxdriverproject.org, KY Srinivasan <kys@...rosoft.com>
Subject: Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote:
> On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote:
> > Ok, lets take a step back. The bisect/kexec attempts led us away from the
> > initial problem which is the machine locking up after login, right?
> >
> 
> Yes; sorry about that..

Nothing to be sorry about.

>     x86/vector: Replace the raw_spin_lock() with
> 
> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
> index 7504491..e5bab02 100644
> --- a/arch/x86/kernel/apic/vector.c
> +++ b/arch/x86/kernel/apic/vector.c
> @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd,
>                              const struct cpumask *dest, bool force)
>  {
>         struct apic_chip_data *apicd = apic_chip_data(irqd);
> +       unsigned long flags;
>         int err;
>  
>         /*
> @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd,
>             (apicd->is_managed || apicd->can_reserve))
>                 return IRQ_SET_MASK_OK;
>  
> -       raw_spin_lock(&vector_lock);
> +       raw_spin_lock_irqsave(&vector_lock, flags);
>         cpumask_and(vector_searchmask, dest, cpu_online_mask);
>         if (irqd_affinity_is_managed(irqd))
>                 err = assign_managed_vector(irqd, vector_searchmask);
>         else
>                 err = assign_vector_locked(irqd, vector_searchmask);
> -       raw_spin_unlock(&vector_lock);
> +       raw_spin_unlock_irqrestore(&vector_lock, flags);
>         return err ? err : IRQ_SET_MASK_OK;
>  }
> 
> With this, I still get the lockup messages after login, but not the
> freezes!

That's really interesting. There should be no code path which calls into
that with interrupts enabled. I assume you never ran that kernel with
CONFIG_PROVE_LOCKING=y.

Find below a debug patch which should show us the call chain for that
case. Please apply that on top of Dou's patch so the machine stays
accessible. Plain output from dmesg is sufficient.

> The lockups register in the log, which I am attaching (see below for
> attachment naming conventions).

Hmm. That's RCU lockups and that backtrace on the CPU which gets the stall
looks very familiar. I'd like to see the above result first and then I'll
send you another pile of patches which might cure that RCU issue.

Thanks,

	tglx

8<-------------------
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -729,6 +729,8 @@ static int apic_set_affinity(struct irq_
 	unsigned long flags;
 	int err;
 
+	WARN_ON_ONCE(!irqs_disabled());
+
 	/*
 	 * Core code can call here for inactive interrupts. For inactive
 	 * interrupts which use managed or reservation mode there is no




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ