lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110106092753.GA32343@ywang-moblin2.bj.intel.com>
Date:	Thu, 6 Jan 2011 17:27:53 +0800
From:	Yong Wang <yong.y.wang@...ux.intel.com>
To:	Youquan Song <youquan.song@...el.com>
Cc:	linux-kernel@...r.kernel.org, hpa@...ux.intel.com,
	suresh.b.siddha@...el.com, arjan@...ux.intel.com, trenn@...e.de,
	kent.liu@...el.com, chaohong.guo@...el.com,
	Youquan Song <youquan.song@...ux.intel.com>
Subject: Re: [PATCH 1/2] apic: Fix error interrupt report at all APs

On Thu, Jan 06, 2011 at 11:28:51AM +0800, Youquan Song wrote:
> Recently, customer report that once machine boot, there are many error interrupt
> reported with exact number of all APs. 
> 
> The root cause is Local APIC will generate error interrupt when it detect
> the illegal vector (one in 0 ~ 15) in an interrupt message received or
> interrupt generate from local vector table or self IPI. SDM3A.chapter 10.
> 
> The thermal sensor register will be reset to 0x10000, current thermal throttling
> driver will first restore AP with the thermal sensor register value of geting
> from BSP,but BSP thermal sensor register is also set 0x10000.  value 0x10000
> means the interrupt vector is zero. After writing 0x10000 to thermal sensor LVT,
> the processor will recieve the error interrupt report if the APIC error
> interrupt is also set.
> 

If the thermal interrupt vector of all CPUs is 0 and you are seeing
LAPIC error interrupts, the correct way to fix it is to disable digital
thermal sensors from generating interrupts.

> Restore thermal sensor value of BSP is useless because it will soon be set to
> correct value included legal vector information.  This patch remove the restore
> process. So the agony noise of error interrupt will be dismiss when machine boot   
> 
> Signed-off-by: Youquan Song <youquan.song@...el.com>
> ---
>  arch/x86/kernel/cpu/mcheck/therm_throt.c |   11 -----------
>  1 files changed, 0 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> index 4b68326..1658483 100644
> --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
> +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
> @@ -405,17 +405,6 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
>  	 */
>  	rdmsr(MSR_IA32_MISC_ENABLE, l, h);
>  
> -	/*
> -	 * The initial value of thermal LVT entries on all APs always reads
> -	 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
> -	 * sequence to them and LVT registers are reset to 0s except for
> -	 * the mask bits which are set to 1s when APs receive INIT IPI.
> -	 * Always restore the value that BIOS has programmed on AP based on
> -	 * BSP's info we saved since BIOS is always setting the same value
> -	 * for all threads/cores
> -	 */
> -	apic_write(APIC_LVTTHMR, lvtthmr_init);
> -

NACK. Please take a look at the commit msg adding the above code you are
trying to delete. If the code is deleted, the bug that
a2202aa29289db64ca7988b12343158b67b27f10 solved will pop up again.

Btw, please try your best to find and copy the person whose code you
wanna delete in your patch.

-Yong

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ