[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F6AE48D.4070508@linux.vnet.ibm.com>
Date: Thu, 22 Mar 2012 14:06:29 +0530
From: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To: Borislav Petkov <bp@...64.org>
CC: Frederic Weisbecker <fweisbec@...il.com>,
Ingo Molnar <mingo@...e.hu>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
LKML <linux-kernel@...r.kernel.org>,
Borislav Petkov <borislav.petkov@....com>,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Subject: Re: [PATCH 2/2] x86, mce: Add persistent MCE event
On 03/21/2012 08:04 PM, Borislav Petkov wrote:
> From: Borislav Petkov <borislav.petkov@....com>
>
> Add the necessary glue to enable the mce_record tracepoint on boot,
> turning it into a persistent event. This exports the MCE buffer through
> a debugfs per-CPU file which a userspace daemon can read and then
> process the received error data further.
>
> Signed-off-by: Borislav Petkov <borislav.petkov@....com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 53 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 5a11ae2e9e91..791c4633d771 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -95,6 +95,13 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
> static DEFINE_PER_CPU(struct mce, mces_seen);
> static int cpu_missing;
>
> +static struct perf_event_attr pattr = {
> + .type = PERF_TYPE_TRACEPOINT,
> + .size = sizeof(pattr),
> + .sample_type = PERF_SAMPLE_RAW,
> + .persistent = 1,
> +};
> +
> /* MCA banks polled by the period polling timer for corrected events */
> DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
> [0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
> @@ -102,6 +109,8 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
>
> static DEFINE_PER_CPU(struct work_struct, mce_work);
>
> +static DEFINE_PER_CPU(struct pers_event_desc, mce_ev);
> +
> /*
> * CPU/chipset specific EDAC code can register a notifier call here to print
> * MCE errors in a human-readable form.
> @@ -2109,6 +2118,50 @@ static void __cpuinit mce_reenable_cpu(void *h)
> }
> }
>
> +static __init int mcheck_init_persistent_event(void)
> +{
> +
> +#define MCE_RECORD_FNAME_SZ 14
> +#define MCE_BUF_PAGES 4
> +
> + int cpu, err = 0;
> + char buf[MCE_RECORD_FNAME_SZ];
> +
> + pattr.config = event_mce_record.event.type;
> + pattr.sample_period = 1;
> + pattr.wakeup_events = 1;
> +
> + get_online_cpus();
> +
> + for_each_online_cpu(cpu) {
> + struct pers_event_desc *d = &per_cpu(mce_ev, cpu);
> +
> + snprintf(buf, MCE_RECORD_FNAME_SZ, "mce_record%d", cpu);
> + d->dfs_name = buf;
> + d->pattr = &pattr;
> +
> + if (perf_add_persistent_on_cpu(cpu, d, mce_get_debugfs_dir(),
> + MCE_BUF_PAGES))
> + goto err_unwind;
> + }
> + goto unlock;
> +
> +err_unwind:
> + err = -EINVAL;
> + for (--cpu; cpu >= 0; cpu--)
> + perf_rm_persistent_on_cpu(cpu, &per_cpu(mce_ev, cpu));
> +
*Totally* theoretical question: How do you know that the cpu_online_mask isn't
sparse? In other words, what if some CPUs weren't booted? Then this for-loop
wouldn't be very good..
Oh, now I see that perf_rm_persistent_on_cpu() probably handles that case well..
So no issues I guess.. ?
(Moreover, we will probably have bigger issues at hand if some CPU didn't
boot..)
(The code looked funny, so I thought of pointing it out, whether or not it
actually is worrisome. Sorry for the noise, if any).
> +unlock:
> + put_online_cpus();
> +
> + return err;
> +}
> +
> +/*
> + * This has to run after event_trace_init()
> + */
> +device_initcall(mcheck_init_persistent_event);
> +
> /* Get notified when a cpu comes on/off. Be hotplug friendly. */
> static int __cpuinit
> mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
Regards,
Srivatsa S. Bhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists