lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.02.1205230859190.3231@ionos>
Date:	Wed, 23 May 2012 12:09:37 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Chen Gong <gong.chen@...ux.intel.com>
cc:	Tony Luck <tony.luck@...el.com>, bp@...64.org, x86@...nel.org,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] x86: auto poll/interrupt mode switch for CMC to stop
 CMC storm

On Wed, 23 May 2012, Chen Gong wrote:

> This idea is inspired from IA64 implementation. It is like
> NAPI for network stack. When CMCI is too many to handle,
> this interrupt can be disabled and then poll mode will take
> over the events handle. When no more events happen in the
> system, CMC interrupt can be enabled automatically.

>  /*
>   * CPU/chipset specific EDAC code can register a notifier call here to print
>   * MCE errors in a human-readable form.
> @@ -582,6 +603,37 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  {
>  	struct mce m;
>  	int i;
> +	unsigned long flag;
> +
> +	spin_lock_irqsave(&cmc_poll_lock, flag);
> +	if (cmci_storm_detected == 0) {
> +		unsigned long now = jiffies;
> +		int *count = &__get_cpu_var(cmci_storm_warning);
> +		unsigned long *history = &__get_cpu_var(first_cmci_jiffie);
> +
> +		if (time_before_eq(now, *history + HZ))
> +			(*count)++;

So that means if you get more than 5 mces per second you switch to
polling mode and risk losing data. How was this number chosen?

> +		else {
> +			*count = 0;
> +			*history = now;
> +		}
> +
> +		if (*count >= CMC_STORM) {
> +			cmci_storm_detected = 1;

So this variable is global and does what? Signalling all the per cpu
poll timers to check for silence. How is that supposed to work? See
below.

> +			/* If we're being hit with CMC interrupts, we won't
> +			 * ever execute the schedule_work() below.  Need to
> +			 * disable CMC interrupts on this processor now.
> +			 */
> +			mce_disable_cmci(NULL);
> +			if (!work_pending(&cmc_disable_work))
> +				schedule_work(&cmc_disable_work);

What's the point of doing this work? Why can't we just do that on the
CPU which got hit by the MCE storm and leave the others alone? They
either detect it themself or are just not affected. 

> +			spin_unlock_irqrestore(&cmc_poll_lock, flag);
> +			printk(KERN_WARNING "WARNING: Switching to polling "\
> +				"CMC handler; error records may be lost\n");
>  
> @@ -1253,8 +1320,19 @@ static void mce_start_timer(unsigned long data)
>  	n = &__get_cpu_var(mce_next_interval);
>  	if (mce_notify_irq())
>  		*n = max(*n/2, HZ/100);
> -	else
> +	else {
>  		*n = min(*n*2, (int)round_jiffies_relative(check_interval*HZ));
> +		/* if no CMC event, switch out of polling mode */
> +		spin_lock_irqsave(&cmc_poll_lock, flags);
> +		if (cmci_storm_detected == 1) {
> +			printk(KERN_WARNING "Returning to interrupt driven "\
> +					"CMC handler\n");
> +			if (!work_pending(&cmc_enable_work))
> +				schedule_work(&cmc_enable_work);
> +			cmci_storm_detected = 0;
> +		}
> +		spin_unlock_irqrestore(&cmc_poll_lock, flags);
> +	}

This is really brilliant.

CPU0                                 CPU1
lock(poll_lock);
cmci_storm_detected = 1
mce_disable_cmci();
schedule_work(disable_work);
unlock(poll_lock);          
			             mce_timer runs
                                       lock(poll_lock);
                                       if (cmci_storm_detected) {
                                          schedule_work(cmc_enable_work);
                                       	  cmci_storm_detected = 0;
                                       }
				       unlock(poll_lock);
.....

disable_work()                       enable_work()
   on_each_cpu(mce_disable_cmci);       on_each_cpu(mce_enable_cmci);

The approach for this polling mode is horribly broken. You are mixing
global and per cpu states, how is that supposed to work reliably?

What's wrong with doing that strictly per cpu and avoid the whole
global state horror?

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ