linux-kernel - Re: [PATCH] x86_64: dynamic MCE poll interval

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc96a5040704271403p5b0d833v8c28a9dae7f4d110@mail.gmail.com>
Date:	Fri, 27 Apr 2007 14:03:22 -0700
From:	"Tim Hockin" <thockin@...gle.com>
To:	ak@....de, vojtech@...e.cz
Cc:	linux-kernel@...r.kernel.org, akpm@...gle.com
Subject: Re: [PATCH] x86_64: dynamic MCE poll interval

Sorry, Gmail mangles whitespace unless you do just the right thing.
Let me work around it.  Proper patch coming.


On 4/27/07, Tim Hockin <thockin@...gle.com> wrote:
> From: Tim Hockin <thockin@...gle.com>
>
> Background:
>  We've found that MCEs (specifically DRAM SBEs) tend to come in bunches,
>  especially when we are trying really hard to stress the system out.  The
>  current MCE poller uses a static interval which does not care whether it
>  has or has not found MCEs recently.
>
> Description:
>  This patch makes the MCE poller adjust the polling interval dynamically.
>  If we find an MCE, poll 2x faster (down to 10 ms).  When we stop finding
>  MCEs, poll 2x slower (up to check_interval seconds).  The check_interval
>  tunable becomes the max polling interval.  The "Machine check events
>  logged" printk() is rate limited to the check_interval, which should be
>  identical behavior to the old functionality.
>
> Result:
>  If you start to take a lot of correctable errors (not exceptions), you
>  log them faster and more accurately (less chance of overflowing the MCA
>  registers).  If you don't take a lot of errors, you will see no change.
>
> Alternatives:
>  I considered simply reducing the polling interval to 10 ms immediately
>  and keeping it there as long as we continue to find errors.  This felt a
>  bit heavy handed, but does perform significantly better for the default
>  check_interval of 5 minutes (we're using a few seconds when testing for
>  DRAM errors).  I could be convinced to go with this, if anyone felt it
>  was not too aggressive.
>
> Testing:
>  I used an error-injecting DIMM to create lots of correctable DRAM errors
>  and verified that the polling interval accelerates.  The printk() only
>  happens once per check_interval seconds.
>
> Patch:
>  This patch is against 2.6.21-rc7.
>
> Signed-Off-By: Tim Hockin <thockin@...gle.com>
>
> ---
>
> This is the third version of this patch.  The only change from the prior
> version is to use time_after_eq().
>
> diff -pruN linux-2.6.20/Documentation/x86_64/machinecheck
> linux-2.6.20+th/Documentation/x86_64/machinecheck
> --- linux-2.6.20/Documentation/x86_64/machinecheck      2007-04-24
> 23:36:03.000000000 -0700
> +++ linux-2.6.20+th/Documentation/x86_64/machinecheck   2007-04-27
> 10:11:10.000000000 -0700
> @@ -36,7 +36,12 @@ between all CPUs.
>
>  check_interval
>         How often to poll for corrected machine check errors, in seconds
> -       (Note output is hexademical). Default 5 minutes.
> +       (Note output is hexademical). Default 5 minutes.  When the poller
> +       finds MCEs it triggers an exponential speedup (poll more often) on
> +       the polling interval.  When the poller stops finding MCEs, it
> +       triggers an exponential backoff (poll less often) on the polling
> +       interval. The check_interval variable is both the initial and
> +       maximum polling interval.
>
>  tolerant
>         Tolerance level. When a machine check exception occurs for a non
> diff -pruN linux-2.6.20/arch/x86_64/kernel/mce.c
> linux-2.6.20+th/arch/x86_64/kernel/mce.c
> --- linux-2.6.20/arch/x86_64/kernel/mce.c       2007-04-27 10:01:02.000000000 -0700
> +++ linux-2.6.20+th/arch/x86_64/kernel/mce.c    2007-04-27 10:41:02.000000000 -0700
> @@ -323,10 +323,13 @@ void mce_log_therm_throt_event(unsigned
>  #endif /* CONFIG_X86_MCE_INTEL */
>
>  /*
> - * Periodic polling timer for "silent" machine check errors.
> + * Periodic polling timer for "silent" machine check errors.  If the
> + * poller finds an MCE, poll 2x faster.  When the poller finds no more
> + * errors, poll 2x slower (up to check_interval seconds).
>   */
>
>  static int check_interval = 5 * 60; /* 5 minutes */
> +static int next_interval; /* in jiffies */
>  static void mcheck_timer(struct work_struct *work);
>  static DECLARE_DELAYED_WORK(mcheck_work, mcheck_timer);
>
> @@ -339,7 +342,6 @@ static void mcheck_check_cpu(void *info)
>  static void mcheck_timer(struct work_struct *work)
>  {
>         on_each_cpu(mcheck_check_cpu, NULL, 1, 1);
> -       schedule_delayed_work(&mcheck_work, check_interval * HZ);
>
>         /*
>          * It's ok to read stale data here for notify_user and
> @@ -349,17 +351,30 @@ static void mcheck_timer(struct work_str
>          * writes.
>          */
>         if (notify_user && console_logged) {
> +               static unsigned long last_print = 0;
> +               unsigned long now = jiffies;
> +
> +               /* if we logged an MCE, reduce the polling interval */
> +               next_interval = max(next_interval/2, HZ/100);
>                 notify_user = 0;
>                 clear_bit(0, &console_logged);
> -               printk(KERN_INFO "Machine check events logged\n");
> +               if (time_after_eq(now, last_print + (check_interval*HZ))) {
> +                       last_print = now;
> +                       printk(KERN_INFO "Machine check events logged\n");
> +               }
> +       } else {
> +               next_interval = min(next_interval*2, check_interval*HZ);
>         }
> +
> +       schedule_delayed_work(&mcheck_work, next_interval);
>  }
>
>
>  static __init int periodic_mcheck_init(void)
>  {
> -       if (check_interval)
> -               schedule_delayed_work(&mcheck_work, check_interval*HZ);
> +       next_interval = check_interval * HZ;
> +       if (next_interval)
> +               schedule_delayed_work(&mcheck_work, next_interval);
>         return 0;
>  }
>  __initcall(periodic_mcheck_init);
> @@ -597,12 +612,13 @@ static int mce_resume(struct sys_device
>  /* Reinit MCEs after user configuration changes */
>  static void mce_restart(void)
>  {
> -       if (check_interval)
> +       if (next_interval)
>                 cancel_delayed_work(&mcheck_work);
>         /* Timer race is harmless here */
>         on_each_cpu(mce_init, NULL, 1, 1);
> -       if (check_interval)
> -               schedule_delayed_work(&mcheck_work, check_interval*HZ);
> +       next_interval = check_interval * HZ;
> +       if (next_interval)
> +               schedule_delayed_work(&mcheck_work, next_interval);
>  }
>
>  static struct sysdev_class mce_sysclass = {
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/