lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070427090917.GA24922@muc.de>
Date:	Fri, 27 Apr 2007 11:09:17 +0200
From:	Andi Kleen <ak@....de>
To:	Tim Hockin <thockin@...gle.com>
Cc:	vojtech@...e.cz, linux-kernel@...r.kernel.org, akpm@...gle.com
Subject: Re: [PATCH] x86_64: dynamic MCE poll interval

On Thu, Apr 26, 2007 at 06:02:52PM -0700, Tim Hockin wrote:
> Description:
>  This patch makes the MCE poller adjust the polling interval dynamically.
>  If we find an MCE, poll 2x faster (down to 10 ms).  When we stop finding
>  MCEs, poll 2x slower (up to check_interval seconds).  The check_interval
>  tunable becomes the max polling interval.

Can you please fix the documentation then?

> 
> Result:
>  If you start to take a lot of correctable errors (not exceptions), you
>  log them faster and more accurately (less chance of overflowing the MCA
>  registers).  If you don't take a lot of errors, you will see no change.

Makes sense.

AMD RevF can do this using the threshold interrupts too for DIMM errors
too without any delays -- perhaps it would also make sense to configure 
this by default that it always triggers on all DIMM errors. 
Right now it is just an option in /sys

> @@ -349,17 +349,24 @@ static void mcheck_timer(struct work_str
> 	 * writes.
> 	 */
> 	if (notify_user && console_logged) {
> +		/* if we logged an MCE, reduce the polling interval */
> +		next_interval = max(next_interval/2, HZ/100);
> 		notify_user = 0;
> 		clear_bit(0, &console_logged);
> 		printk(KERN_INFO "Machine check events logged\n");

The printk should not happen too often. Can you add some hardcoded
limit there than it doesn't happen more often than every hour or so
(or perhaps use a exponential backoff here too?)
It is only to tell users to check mcelog output.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ