[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <491CA0DC.8070405@cosmosbay.com>
Date: Thu, 13 Nov 2008 22:49:16 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Ingo Molnar <mingo@...e.hu>
CC: Jiri Kosina <jkosina@...e.cz>, Andi Kleen <andi@...stfloor.org>,
Robert Richter <robert.richter@....com>,
oprofile-list@...ts.sf.net, Jiri Benc <jbenc@...e.cz>,
Vilem Marsik <vmarsik@...e.cz>,
Pekka Enberg <penberg@...helsinki.fi>,
linux-kernel@...r.kernel.org
Subject: Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU
Ingo Molnar a écrit :
> * Jiri Kosina <jkosina@...e.cz> wrote:
>
>> On Thu, 13 Nov 2008, Ingo Molnar wrote:
>>
>>>> I haven't yet found a time to start bisecting this.
>>> Would be nice to identify a commit to revert - in case we run out of
>>> time fixing it.
>> Yup, I first wanted to make this known to the public in hope that it
>> will ring a bell somewhere.
>>
>> If noone sees an obvous reason for this, I will do my best to bisect
>> this tomorrow.
>
> We've got the one patch below pending, but that's not for AMD cpus so
> it shouldnt impact your case.
>
> But ... some change made it all much more fragile. I'm curious why
> things became more fragile.
>
> Ingo
>
> --------------->
> Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Tue, 11 Nov 2008 09:32:12 +0100
>
> While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
> I noticed that one CPU after the other could not get anymore NMI.
>
> After a while, all cores where blocked (ie not generating events for oprofile)
> I tried all major linux versions and all where affected by this freeze.
>
> I found that we have to un-mask APIC *before* writing to MSR counter
> when we get event notification, because we use APIC_LVTPC in edge triggered mode.
>
> Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
> arch/x86/oprofile/op_model_ppro.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> Index: tip/arch/x86/oprofile/op_model_ppro.c
> ===================================================================
> --- tip.orig/arch/x86/oprofile/op_model_ppro.c
> +++ tip/arch/x86/oprofile/op_model_ppro.c
> @@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg
> u64 val;
> int i;
>
> + /*
> + * We need to unmask the apic vector *before* writing reset_value
> + * to msr counter, because we use edge trigger
> + */
> + apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> +
> for (i = 0 ; i < num_counters; ++i) {
> if (!reset_value[i])
> continue;
> @@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg
> }
> }
>
> - /* Only P6 based Pentium M need to re-unmask the apic vector but it
> - * doesn't hurt other P6 variant */
> - apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> -
> /* We can't work out if we really handled an interrupt. We
> * might have caught a *second* counter just after overflowing
> * the interrupt for this counter then arrives
Just to clarify, I found this patch necessary for previous linux versions as well.
Maybe new CPUS from intel triggers a software bug, I dont know.
Also, I posted a patch about the kmalloc() of reset_value, I am not sure patch was pushed.
This one is a real bug.
[PATCH] oprofile: fix an overflow in ppro code
reset_value was changed from long to u64 in commit b99170288421c79f0c2efa8b33e26e65f4bb7fb8
(oprofile: Implement Intel architectural perfmon support)
But dynamic allocation of this array use a wrong type (long instead of u64)
Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
arch/x86/oprofile/op_model_ppro.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
View attachment "oprofile_ppro.patch" of type "text/plain" (482 bytes)
Powered by blists - more mailing lists