lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200414062454.GA84326@gmail.com>
Date:   Tue, 14 Apr 2020 08:24:54 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Leon Romanovsky <leon@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, x86 <x86@...nel.org>,
        Suresh Siddha <suresh.b.siddha@...el.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/apic: Fix circular locking dependency between
 console and hrtimer locks


* Leon Romanovsky <leon@...nel.org> wrote:

> Hi,
> 
> Any feedback?
> https://lore.kernel.org/lkml/20200407170925.1775019-1-leon@kernel.org/

The fix definitely looks legit, lockdep is right that we shouldn't take 
the console_sem.lock even under trylock.

It's only a printk_once(), yet I'm wondering why in the last ~8 years 
this never triggered. Nobody ever ran lockdep and debug console level 
enabled on such hardware, or did something else change?

One possibility would be that apic_check_deadline_errata() marked almost 
all Intel systems as broken and the TSC-deadline hardware never actually 
got activated. In that case you have triggered rarely tested code and 
might see other weirdnesses. Just saying. :-)

Or a bootup with "debug" specified is much more rare in production 
systems, hence the 8 years old bug.

> > It is far away from my main expertise and I'm not sure that the solution
> > is correct, but it definitely fixed our regression.
> > ---
> >  arch/x86/kernel/apic/apic.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index d254cebdd3c3..6706b2cd9aec 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -353,7 +353,7 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen)
> >  		 */
> >  		asm volatile("mfence" : : : "memory");
> >
> > -		printk_once(KERN_DEBUG "TSC deadline timer enabled\n");
> > +		printk_deferred_once(KERN_DEBUG "TSC deadline timer enabled\n");

I think we should move this essentially initialization-time message much 
earlier during bootup, when we are not holding any hrtimer locks.

One good place would be apic_check_deadline_errata(). This place:

        if (boot_cpu_data.microcode >= rev)
                return;

        setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
        pr_err(FW_BUG "TSC_DEADLINE disabled due to Errata; "
               "please update microcode to version: 0x%x (or later)\n", rev);

Could be something like:

        if (boot_cpu_data.microcode >= rev) {
		pr_debug("x86/apic: TSC deadline timer enabled.\n");
                return;
	}

        setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
        pr_err(FW_BUG "TSC_DEADLINE disabled due to CPU errata, please update microcode to version: 0x%x (or later)\n", rev);

(Note the small fixes I did to the errata message - we should do that and 
also move all user-facing messages into a single line while at it.)

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ