linux-kernel - Re: [PATCH 2/2] x86, mce: Add persistent MCE event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F6AE48D.4070508@linux.vnet.ibm.com>
Date:	Thu, 22 Mar 2012 14:06:29 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Borislav Petkov <borislav.petkov@....com>,
	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Subject: Re: [PATCH 2/2] x86, mce: Add persistent MCE event

On 03/21/2012 08:04 PM, Borislav Petkov wrote:

> From: Borislav Petkov <borislav.petkov@....com>
> 
> Add the necessary glue to enable the mce_record tracepoint on boot,
> turning it into a persistent event. This exports the MCE buffer through
> a debugfs per-CPU file which a userspace daemon can read and then
> process the received error data further.
> 
> Signed-off-by: Borislav Petkov <borislav.petkov@....com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |   53 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 5a11ae2e9e91..791c4633d771 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -95,6 +95,13 @@ static DECLARE_WAIT_QUEUE_HEAD(mce_chrdev_wait);
>  static DEFINE_PER_CPU(struct mce, mces_seen);
>  static int			cpu_missing;
> 
> +static struct perf_event_attr pattr = {
> +	.type           = PERF_TYPE_TRACEPOINT,
> +	.size           = sizeof(pattr),
> +	.sample_type    = PERF_SAMPLE_RAW,
> +	.persistent     = 1,
> +};
> +
>  /* MCA banks polled by the period polling timer for corrected events */
>  DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
>  	[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
> @@ -102,6 +109,8 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
> 
>  static DEFINE_PER_CPU(struct work_struct, mce_work);
> 
> +static DEFINE_PER_CPU(struct pers_event_desc, mce_ev);
> +
>  /*
>   * CPU/chipset specific EDAC code can register a notifier call here to print
>   * MCE errors in a human-readable form.
> @@ -2109,6 +2118,50 @@ static void __cpuinit mce_reenable_cpu(void *h)
>  	}
>  }
> 
> +static __init int mcheck_init_persistent_event(void)
> +{
> +
> +#define MCE_RECORD_FNAME_SZ 14
> +#define MCE_BUF_PAGES 4
> +
> +	int cpu, err = 0;
> +	char buf[MCE_RECORD_FNAME_SZ];
> +
> +	pattr.config = event_mce_record.event.type;
> +	pattr.sample_period = 1;
> +	pattr.wakeup_events = 1;
> +
> +	get_online_cpus();
> +
> +	for_each_online_cpu(cpu) {
> +		struct pers_event_desc *d = &per_cpu(mce_ev, cpu);
> +
> +		snprintf(buf, MCE_RECORD_FNAME_SZ, "mce_record%d", cpu);
> +		d->dfs_name = buf;
> +		d->pattr = &pattr;
> +
> +		if (perf_add_persistent_on_cpu(cpu, d, mce_get_debugfs_dir(),
> +					       MCE_BUF_PAGES))
> +			goto err_unwind;
> +	}
> +	goto unlock;
> +
> +err_unwind:
> +	err = -EINVAL;
> +	for (--cpu; cpu >= 0; cpu--)
> +		perf_rm_persistent_on_cpu(cpu, &per_cpu(mce_ev, cpu));
> +


*Totally* theoretical question: How do you know that the cpu_online_mask isn't
sparse? In other words, what if some CPUs weren't booted? Then this for-loop
wouldn't be very good..

Oh, now I see that perf_rm_persistent_on_cpu() probably handles that case well..
So no issues I guess.. ?

(Moreover, we will probably have bigger issues at hand if some CPU didn't
boot..)

(The code looked funny, so I thought of pointing it out, whether or not it
actually is worrisome. Sorry for the noise, if any).

> +unlock:
> +	put_online_cpus();
> +
> +	return err;
> +}
> +
> +/*
> + * This has to run after event_trace_init()
> + */
> +device_initcall(mcheck_init_persistent_event);
> +
>  /* Get notified when a cpu comes on/off. Be hotplug friendly. */
>  static int __cpuinit
>  mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)


Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/