lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <491CA0DC.8070405@cosmosbay.com>
Date:	Thu, 13 Nov 2008 22:49:16 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Jiri Kosina <jkosina@...e.cz>, Andi Kleen <andi@...stfloor.org>,
	Robert Richter <robert.richter@....com>,
	oprofile-list@...ts.sf.net, Jiri Benc <jbenc@...e.cz>,
	Vilem Marsik <vmarsik@...e.cz>,
	Pekka Enberg <penberg@...helsinki.fi>,
	linux-kernel@...r.kernel.org
Subject: Re: Oprofile [still] doesn't work on 2.6.28-rc4 on certain CPU

Ingo Molnar a écrit :
> * Jiri Kosina <jkosina@...e.cz> wrote:
> 
>> On Thu, 13 Nov 2008, Ingo Molnar wrote:
>>
>>>> I haven't yet found a time to start bisecting this.
>>> Would be nice to identify a commit to revert - in case we run out of 
>>> time fixing it.
>> Yup, I first wanted to make this known to the public in hope that it 
>> will ring a bell somewhere.
>>
>> If noone sees an obvous reason for this, I will do my best to bisect 
>> this tomorrow.
> 
> We've got the one patch below pending, but that's not for AMD cpus so 
> it shouldnt impact your case.
> 
> But ... some change made it all much more fragile. I'm curious why 
> things became more fragile.
> 
> 	Ingo
> 
> --------------->
> Subject: oprofile: un-mask APIC before resetting counter in ppro_check_ctrs()
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Tue, 11 Nov 2008 09:32:12 +0100
> 
> While using oprofile on my HP BL460c G1, (two quad core intel E5450 CPU),
> I noticed that one CPU after the other could not get anymore NMI.
> 
> After a while, all cores where blocked (ie not generating events for oprofile)
> I tried all major linux versions and all where affected by this freeze.
> 
> I found that we have to un-mask APIC *before* writing to MSR counter
> when we get event notification, because we use APIC_LVTPC in edge triggered mode.
> 
> Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
> Signed-off-by: Ingo Molnar <mingo@...e.hu>
> ---
>  arch/x86/oprofile/op_model_ppro.c |   10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> Index: tip/arch/x86/oprofile/op_model_ppro.c
> ===================================================================
> --- tip.orig/arch/x86/oprofile/op_model_ppro.c
> +++ tip/arch/x86/oprofile/op_model_ppro.c
> @@ -126,6 +126,12 @@ static int ppro_check_ctrs(struct pt_reg
>  	u64 val;
>  	int i;
>  
> +	/*
> +	 * We need to unmask the apic vector *before* writing reset_value
> +	 * to msr counter, because we use edge trigger
> +	 */
> +	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> +
>  	for (i = 0 ; i < num_counters; ++i) {
>  		if (!reset_value[i])
>  			continue;
> @@ -136,10 +142,6 @@ static int ppro_check_ctrs(struct pt_reg
>  		}
>  	}
>  
> -	/* Only P6 based Pentium M need to re-unmask the apic vector but it
> -	 * doesn't hurt other P6 variant */
> -	apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
> -
>  	/* We can't work out if we really handled an interrupt. We
>  	 * might have caught a *second* counter just after overflowing
>  	 * the interrupt for this counter then arrives

Just to clarify, I found this patch necessary for previous linux versions as well.

Maybe new CPUS from intel triggers a software bug, I dont know.


Also, I posted a patch about the kmalloc() of reset_value, I am not sure patch was pushed.

This one is a real bug.

[PATCH] oprofile: fix an overflow in ppro code

reset_value was changed from long to u64 in commit b99170288421c79f0c2efa8b33e26e65f4bb7fb8
(oprofile: Implement Intel architectural perfmon support)

But dynamic allocation of this array use a wrong type (long instead of u64)

Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
arch/x86/oprofile/op_model_ppro.c |    2 +-
1 files changed, 1 insertion(+), 1 deletion(-)


View attachment "oprofile_ppro.patch" of type "text/plain" (482 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ