[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110325161628.GB12393@erda.amd.com>
Date: Fri, 25 Mar 2011 17:16:28 +0100
From: Robert Richter <robert.richter@....com>
To: Ingo Molnar <mingo@...e.hu>
CC: Eric Dumazet <eric.dumazet@...il.com>,
Andi Kleen <andi@...stfloor.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jack Steiner <steiner@....com>,
Jan Beulich <JBeulich@...ell.com>,
Borislav Petkov <bp@...64.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Nick Piggin <npiggin@...nel.dk>,
"x86@...nel.org" <x86@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...hat.com>, "tee@....com" <tee@....com>,
Nikanth Karthikesan <knikanth@...e.de>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH RFC] x86: avoid atomic operation in
test_and_set_bit_lock if possible
On 25.03.11 05:32:28, Ingo Molnar wrote:
>
> * Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> > Le vendredi 25 mars 2011 à 00:56 +0100, Andi Kleen a écrit :
> > > > never EVER seen any good explanation of why that particular sh*t
> > > > argument would b true. It seems to be purely about politics, where
> > > > some idiotic vendor (namely HP) has convinced Intel that they really
> > > > need it. To the point where some engineers seem to have bought into
> > > > the whole thing and actually believe that fairy tale ("firmware can do
> > > > better" - hah! They must be feeding people some bad drugs at the
> > > > cafeteria)
> > >
> > > For the record I don't think it's a good idea for the BIOS to do
> > > this (and I'm not aware of any engineer who does),
> > > but I think Linux should do better than just disabling PMU use when
> > > this happens.
> > >
> > > However I suspect taking over SCI would cause endless problems
> > > and is very likely not a good idea.
> >
> > I tried many different changes in BIOS and all failed (the machine is
> > damn slow at boot, this takes age).
> >
> > I am stuck :(
>
> Could you please try the patch below?
>
> Thanks,
>
> Ingo
>
> ------------------->
> From 14df27334ac47a5cec67fb2238d14499346acc38 Mon Sep 17 00:00:00 2001
> From: Ingo Molnar <mingo@...e.hu>
> Date: Fri, 25 Mar 2011 10:24:23 +0100
> Subject: [PATCH] perf, x86: Complain louder about BIOSen corrupting CPU/PMU state and continue
>
> Eric Dumazet reported that hardware PMU events do not work on his
> system, due to the BIOS corrupting PMU state:
>
> Performance Events: PEBS fmt0+, Core2 events, Broken BIOS detected, using software events only.
> [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)
>
> Linus suggested that we continue in the face of such BIOS-induced CPU
> state corruption:
>
> http://lkml.org/lkml/2011/3/24/608
>
> Such BIOSes will have to be fixed - developers rely on a working and fully
> capable PMU and BIOS interfering with CPU state is simply not acceptable.
>
> So this patch changes perf to continue when it detects such BIOS
> interaction, some hardware events may be unreliable due to the BIOS writing
> and re-writing them - there's not much the kernel can do about that.
>
> Reported-by: Eric Dumazet <eric.dumazet@...il.com>
> Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Cc: Arnaldo Carvalho de Melo <acme@...hat.com>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: Mike Galbraith <efault@....de>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
> arch/x86/kernel/cpu/perf_event.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index ec46eea..eb00677 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -500,12 +500,17 @@ static bool check_hw_exists(void)
> return true;
>
> bios_fail:
> - printk(KERN_CONT "Broken BIOS detected, using software events only.\n");
> + /*
> + * We still allow the PMU driver to operate:
> + */
> + printk(KERN_CONT "Broken BIOS detected, complain to your hardware vendor.\n");
> printk(KERN_ERR FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n", reg, val);
> - return false;
> +
> + return true;
Ingo, you jump out the loop here. This skips checks on other registers
and msr access. And if we want to continue anyway, checking msr access
becomes more important as the bios may block it.
-Robert
>
> msr_fail:
> printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
> +
> return false;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists