linux-kernel - Re: [PATCH 1/2] SMI Detector: a simple module for detecting System Management Interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20090409131230.dd3f2df7.akpm@linux-foundation.org>
Date:	Thu, 9 Apr 2009 13:12:30 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Jon Masters <jcm@...masters.org>
Cc:	linux-kernel@...r.kernel.org, jcm@...hat.com, williams@...hat.com,
	tglx@...utronix.de, rostedt@...dmis.org, lgoncalv@...hat.com,
	jcm@...masters.org
Subject: Re: [PATCH 1/2] SMI Detector: a simple module for detecting System
 Management Interrupts

On Thu,  2 Apr 2009 12:50:04 -0400
Jon Masters <jcm@...masters.org> wrote:

> System Management Interrupts (SMIs) are a feature of modern x86-compatible
> microprocessors. They allow BIOS provided management code to "borrow" the
> system CPU in order to perform certain activities - device emulation,
> thermal throttling, fan control, and similar. The CPU exposes the SMI
> functionality through a dedicated pin and special purpose logic. When an
> SMI is asserted, the CPU suspends the overlying running Operating System
> and executes special SMI management routines within a unique context.
> 
> SMI routines are usually provided by the system BIOS, and the SMI assertion
> is usually made by logic in the system's South Bridge. The Operating System
> is typically completely unaware of the presence of an SMI, much less the
> amount of time spent handling them. Therein lies the problem. Some of these
> SMI routines can take many milliseconds to complete (or even longer,
> depending upon the implementation of the routines). Although it is possible
> to disable SMIs (typically by poking at the South Bridge LPC or similar),
> that can be *EXTREMELY DANGEROUS* because it can result in a system
> overheating, or even worse. Therefore, we simply try to detect them. We do
> not attempt to disable them and discourage other efforts at doing so. We
> instead prefer to collate data and ask BIOS vendors to fix their code.
> 
> We have been increasingly concerned about the amount of time certain systems
> spend inside System Management Interrupt (SMI) routines, especially on those
> systems intended for "real time" applications. This module can be used to
> detect SMIs as they occur by periodically grabbing the CPU(s) within a
> stop_machine context, and sitting in a tight loop reading the timestamp
> counter during the monitoring phase. If we experience unexplained
> discontiguous readings in the CPU timestamps, we are able to assert the
> presence of an SMI condition (the CPU is doing nothing else at the time).
> 
> Since we steal the CPU for relatively long periods of time, this module is
> intended primarily for diagnostic use. It should not be used in a normal
> production environment unless "enable" is set to zero or the acquisition
> parameters are set sufficiently low to avoid performance impact.
> 
> There are a number of pending feature improvements to this code but we
> would like to get it out there and in use tracking down SMIs.
> 
>
> ...
>
> +config SMI_DETECTOR
> +	tristate "Test module for detecting time gaps caused by SMIs"
> +	default m
> +	---help---
> +	  A simple SMI detector. Use this module to detect large system
> +	  latencies introduced by the presence of vendor BIOS SMI
> +	  (System Management Interrupts) somehow gone awry. We do this
> +	  by hogging all of the CPU(s) for configurable time intervals,
> +	  looking to see if something stole time from us. Therefore,
> +	  obviously, you should NEVER use this module in a production
> +	  environment.
> +
> +	  If unsure, say N

Does this have enough dependencies?  I have a feeling that the driver
won't work well on mips..  CONFIG_DEBUGFS?  Others?

>
> ...
>
> +module_param(debug, int, 0);
> +module_param(enabled, int, 0);
> +module_param(threshold, int, 0);
> +
> +DEFINE_MUTEX(enable_mutex);

static, please.

> +struct task_struct *smi_kthread;

ditto.

> +struct smdata_struct {
> +
> +	u64	last_sample;
> +	u64	max_sample;
> +	u64	smi_count;
> +	u64	threshold;
> +	ktime_t last_spike;
> +	u64	frequency;
> +
> +	atomic_t pending;
> +	wait_queue_head_t wq;
> +
> +} smdata;

Carefully documenting each of the above fields would be of assistance
to others.

> +static void smi_init_stats(void)
> +{
> +	smdata.last_sample = 0;
> +	smdata.max_sample = 0;
> +	smdata.smi_count = 0;
> +	smdata.frequency = 0;
> +	smdata.threshold = threshold ? threshold : DEFAULT_SMI_THRESHOLD;
> +}
> +
> +
> +struct dentry *smi_debug_dir;		/* SMI debugfs directory */
> +struct dentry *smi_debug_max;		/* maximum TSC delta */
> +struct dentry *smi_debug_smi;		/* SMI detect count */
> +struct dentry *smi_debug_sample_ms;	/* sample size ms */
> +struct dentry *smi_debug_interval_ms;	/* interval size ms */
> +struct dentry *smi_debug_sample;	/* raw SMI samples (us) */
> +struct dentry *smi_debug_threshold;	/* latency threshold (us) */
> +struct dentry *smi_debug_frequency_us;	/* avg smi spike interval (us) */
> +struct dentry *smi_debug_enable;	/* enable/disable smi detection */
> +
> +u32 smi_sample_ms   = DEFAULT_MS_PER_SAMPLE;    /* sample size ms */
> +u32 smi_interval_ms = DEFAULT_MS_SEP_SAMPLE;    /* interval size ms */

Did these all need to be global symbols?

> +/*
> + * smi_get_sample -	Used to repeatedly capture the CPU TSC (or similar),
> + *                      looking for potential SMIs. Called under stop_machine.
> + */
> +static int smi_get_sample(void *data)
> +{
> +	ktime_t start, t1, t2, spike;
> +	s64 diff, total = 0;
> +	u64 sample = 0;
> +	struct smdata_struct *smi_data = (struct smdata_struct *)data;

Unneeded and undesirable cast of void*.

> +	start = ktime_get(); /* start timestamp */
> +
> +	do {
> +
> +		t1 = ktime_get();
> +		t2 = ktime_get();

I'd have expected to see a local_irq_disable() around these two samplings.

> +		total = ktime_to_us(ktime_sub(t2, start));
> +
> +		diff = ktime_to_us(ktime_sub(t2, t1));
> +		if (diff < 0) {
> +			printk(KERN_ERR SMI_BANNER "time running backwards\n");
> +			return 1;
> +		}
> +		if (diff > sample)
> +			sample = diff; /* only want highest value per sample */
> +
> +		if (diff > smi_data->threshold)
> +			spike = t1;
> +
> +	} while (total <= USEC_PER_MSEC*smi_sample_ms);

Naive question: should we be using nanoseconds here rather than rather
coarse microseconds?


> +	smi_data->last_sample = sample;
> +
> +	if (sample > smi_data->threshold) {
> +		u64 tmp;

eek.  `tmp' is a wicked identifier.  Can we choose something more meaningful?

> +		smi_data->smi_count++;
> +		tmp = ktime_to_us(ktime_sub(spike, smi_data->last_spike));
> +
> +		if (smi_data->smi_count > 2)
> +			smi_data->frequency = (smi_data->frequency + tmp) / 2;
> +		else
> +			if (smi_data->smi_count == 2)
> +				smi_data->frequency = tmp;
> +
> +		smi_data->last_spike = spike;
> +	}
> +
> +	atomic_set(&smi_data->pending, 1);

See, this code would be more easy to review if you'd told us what
`pending' does.

> +	if (sample > smi_data->max_sample)
> +		smi_data->max_sample = sample;
> +
> +	return 0;
> +}
> +
> +/*
> + * smi_kthread_fn - Used to periodically sample the CPU TSC via smi_get_sample.
> + *                  We use stop_machine, which intentionally introduces latency.
> + */
> +
> +static int smi_kthread_fn(void *data)
> +{
> +	int err = 0;
> +	struct smdata_struct *smi_data = (struct smdata_struct *)data;

Unneeded cast of void*.

> +	while (!kthread_should_stop()) {
> +		err = stop_machine(smi_get_sample, smi_data, 0);
> +
> +		wake_up(&smi_data->wq);
> +
> +		if (msleep_interruptible(smi_interval_ms))
> +			goto out;

hm, how can this happen?  Can root kill the kernel thread?  I think
that gets ignored..

> +	}
> +
> +out:
> +	return 0;
> +}
> +
> +struct task_struct *start_smi_thread(void)
> +{
> +	return smi_kthread = kthread_run(smi_kthread_fn, &smdata,
> +					 "smi_detector");
> +}

All callers of this function ignore its return value.


> +void stop_smi_thread(void)
> +{
> +	if (smi_kthread)
> +		kthread_stop(smi_kthread);
> +}

This seems to assume that kthread_run() returns NULL on failure.  But
it returns an ERR_PTR.

> +static int smi_debug_sample_fopen(struct inode *inode, struct file *filp)
> +{
> +
> +	filp->private_data = (void *)&smdata;

unneeded cast.

> +	return 0;
> +}

missing newline.

> +static ssize_t smi_debug_sample_fread(struct file *filp, char __user *ubuf,
> +					size_t cnt, loff_t *ppos)
> +{
> +	int len;
> +	char buf[64];
> +	struct smdata_struct *smi_data = filp->private_data;
> +
> +	if (!enabled)
> +		return 0;
> +
> +	wait_event_interruptible(smi_data->wq, atomic_read(&smi_data->pending));
> +	atomic_set(&smi_data->pending, 0);

Is this code racy?  (I couldn't be bothered working it out, but it
smells racy ;))

> +	len = sprintf(buf, "%lld\n", smi_data->last_sample);

%llu would be more accurate.

> +	if (copy_to_user(ubuf, buf, len))
> +		return -EFAULT;

Perhaps the kernel should have an snprintf_to_user() utility.  Maybe it
already has and I forgot.

> +	return len;
> +}
> +
> +static const struct file_operations smi_sample_fops = {
> +	.open		= smi_debug_sample_fopen,
> +	.read		= smi_debug_sample_fread,
> +	.owner		= THIS_MODULE,
> +};
> +
> +static int smi_debug_enable_fopen(struct inode *inode, struct file *filp)
> +{
> +	return 0;
> +}
> +
> +
> +static ssize_t smi_debug_enable_fread(struct file *filp, char __user *ubuf,
> +				      size_t cnt, loff_t *ppos)
> +{
> +	char buffer[4];
> +
> +	if (!cnt || *ppos)
> +		return 0;
> +
> +	buffer[0] = enabled ? '1' : '0';
> +	buffer[1] = '\n';
> +	buffer[2] = '\0';

This is unneeded.

> +	if (copy_to_user(ubuf, buffer, 2))
> +		return -EFAULT;

another snprintf_to_user() user!

> +	return *ppos = 2;
> +}
> +
> +static ssize_t  smi_debug_enable_fwrite(struct file *file,
> +					const char __user *user_buffer,
> +					size_t user_size,
> +					loff_t *offset)
> +{
> +	char buffer[4];
> +	int copy_size = min(user_size, sizeof(buffer));
> +	long val;
> +
> +	if (copy_from_user(buffer, user_buffer, copy_size))
> +		return -EFAULT;
> +#if 1
> +	val = simple_strtol(buffer, NULL, 10);
> +#else
> +	if (strict_strtol(buffer, 10, &val))
> +		return -EINVAL;
> +#endif

strict_strtoul() is preferred.

> +	mutex_lock(&enable_mutex);
> +	if (val) {
> +		if (enabled)
> +			goto unlock;
> +		smi_init_stats();
> +		enabled = 1;
> +		start_smi_thread();
> +	} else {
> +		if (!enabled)
> +			goto unlock;
> +		enabled = 0;
> +		stop_smi_thread();
> +	}
> +unlock:
> +	mutex_unlock(&enable_mutex);
> +	printk(SMI_BANNER "smi_detector thread %s\n",
> +		enabled ? "started" : "stopped");
> +	return user_size;
> +}
> +
> +static const struct file_operations smi_enable_fops = {
> +	.open		= smi_debug_enable_fopen,
> +	.read		= smi_debug_enable_fread,
> +	.write		= smi_debug_enable_fwrite,
> +	.owner		= THIS_MODULE,
> +};
> +
> +/* debugfs_create_bool */
> +int smi_detector_init(void)
> +{
> +
> +	printk(KERN_INFO SMI_BANNER "version %s\n", smi_version);
> +
> +	smi_init_stats();
> +
> +	init_waitqueue_head(&smdata.wq);
> +	atomic_set(&smdata.pending, 0);

These could have been done at compile time.

> +	smi_debug_dir = debugfs_create_dir("smi_detector", NULL);
> +
> +	if (smi_debug_dir == NULL) {
> +		printk(KERN_INFO SMI_BANNER "debugfs not mounted!\n");
> +		return -1;
> +	}
> +
> +	smi_debug_sample_ms = debugfs_create_u32("ms_per_sample",
> +				0644, smi_debug_dir, &smi_sample_ms);
> +	smi_debug_interval_ms = debugfs_create_u32("ms_between_samples",
> +				0644, smi_debug_dir, &smi_interval_ms);
> +	smi_debug_max = debugfs_create_u64("max_sample_us",
> +				0644, smi_debug_dir, &smdata.max_sample);
> +	smi_debug_smi = debugfs_create_u64("smi_count",
> +				0644, smi_debug_dir, &smdata.smi_count);
> +	smi_debug_sample = debugfs_create_file("sample_us",
> +				0444, smi_debug_dir, &smdata, &smi_sample_fops);
> +	smi_debug_frequency_us = debugfs_create_u64("avg_smi_interval_us",
> +				0444, smi_debug_dir, &smdata.frequency);
> +	smi_debug_threshold = debugfs_create_u64("latency_threshold_us",
> +				0444, smi_debug_dir, &smdata.threshold);
> +
> +	smi_debug_enable = debugfs_create_file("enable",
> +				0644, smi_debug_dir,
> +				&enabled, &smi_enable_fops);
> +
> +
> +	if (enabled)
> +		start_smi_thread();
> +
> +	return 0;
> +}
> +
> +void smi_detector_exit(void)
> +{
> +	if (enabled) {
> +		mutex_lock(&enable_mutex);
> +		enabled = 0;
> +		stop_smi_thread();
> +		mutex_unlock(&enable_mutex);
> +	}
> +	if (smi_debug_sample_ms)
> +		debugfs_remove(smi_debug_sample_ms);
> +	if (smi_debug_interval_ms)
> +		debugfs_remove(smi_debug_interval_ms);
> +	if (smi_debug_max)
> +		debugfs_remove(smi_debug_max);
> +	if (smi_debug_smi)
> +		debugfs_remove(smi_debug_smi);
> +	if (smi_debug_frequency_us)
> +		debugfs_remove(smi_debug_frequency_us);
> +	if (smi_debug_threshold)
> +		debugfs_remove(smi_debug_threshold);
> +	if (smi_debug_sample)
> +		debugfs_remove(smi_debug_sample);
> +	if (smi_debug_enable)
> +		debugfs_remove(smi_debug_enable);
> +	if (smi_debug_dir)
> +		debugfs_remove(smi_debug_dir);

debugfs_remove(NULL) is legal, so all the if()s can be removed.

> +}
> +
> +module_init(smi_detector_init);
> +module_exit(smi_detector_exit);
> diff --git a/scripts/smidetect b/scripts/smidetect
> new file mode 100644
> index 0000000..81593a1
> --- /dev/null
> +++ b/scripts/smidetect
> @@ -0,0 +1,282 @@
> +#!/usr/bin/python

Some documentation here would be nice.  An overview of what it does,
command line options, pointer to the .txt file.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/