lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090223083645.GA9582@elte.hu>
Date:	Mon, 23 Feb 2009 09:36:45 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	pm list <linux-pm@...ts.linux-foundation.org>,
	Len Brown <lenb@...nel.org>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts during
	suspend-resume


* Rafael J. Wysocki <rjw@...k.pl> wrote:

> On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> > On Sunday 22 February 2009, Linus Torvalds wrote:
> > > 
> > > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> [--snip--]
> > 
> > Thanks a lot for your comments, I'll send an updated patch shortly.
> 
> The updated patch is appended.
> 
> It has been initially tested, but requires more testing, 
> especially with APM, XEN, kexec jump etc.

>  arch/x86/kernel/apm_32.c  |   20 ++++++++++++----
>  drivers/xen/manage.c      |   32 +++++++++++++++----------
>  include/linux/interrupt.h |    3 ++
>  include/linux/irq.h       |    1 
>  kernel/irq/manage.c       |   57 ++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/kexec.c            |   10 ++++----
>  kernel/power/disk.c       |   46 +++++++++++++++++++++++++++++--------
>  kernel/power/main.c       |   20 +++++++++++-----
>  8 files changed, 152 insertions(+), 37 deletions(-)
> 
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
>  	return retval;
>  }
>  EXPORT_SYMBOL(request_irq);
> +
> +#ifdef CONFIG_PM_SLEEP
> +/**
> + *	suspend_device_irqs - disable all currently enabled interrupt lines

Code placement nit: please dont put new #ifdef blocks into the 
core IRQ code, add a kernel/irq/power.c file instead and make 
the kbuild rule depend on PM_SLEEP.

The new suspend_device_irqs() and resume_device_irqs() doesnt 
use any manage.c internals so this should work straight away.

> + *
> + *	During system-wide suspend or hibernation device interrupts need to be
> + *	disabled at the chip level and this function is provided for this
> + *	purpose.  It disables all interrupt lines that are enabled at the
> + *	moment and sets the IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&desc->lock, flags);
> +
> +		if (!desc->depth && desc->action
> +		    && !(desc->action->flags & IRQF_TIMER)) {
> +			desc->depth++;
> +			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> +			desc->chip->disable(irq);
> +		}
> +
> +		spin_unlock_irqrestore(&desc->lock, flags);
> +	}
> +
> +	for_each_irq_desc(irq, desc) {
> +		if (desc->status & IRQ_SUSPENDED)
> +			synchronize_irq(irq);
> +	}

Optimization/code-flow nit: a possibility might be to do a 
single loop, i.e. i think it's safe to couple the disable+sync 
bits [as in 99.99% of the cases there will be no in-execution 
irq handlers when we execute this.]

Something like:

		int do_sync = 0;

		spin_lock_irqsave(&desc->lock, flags);

		if (!desc->depth && desc->action
		    && !(desc->action->flags & IRQF_TIMER)) {

			desc->depth++;
			desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
			desc->chip->disable(irq);
			do_sync = 1;
		}

		spin_unlock_irqrestore(&desc->lock, flags);

		if (do_sync)
			synchronize_irq(irq);

In fact i'd suggest to factor out this logic into a separate 
__suspend_irq(irq) / __resume_irq(irq) inline helper functions. 
(They should be inline for the time being as they are not 
shared-irq-safe so they shouldnt really be exposed to drivers in 
such a singular capacity.)

Doing so will also fix the line-break ugliness of the first 
branch - as in a standalone function the condition fits into a 
single line.

There's a performance reason as well: especially when we have a 
lot of IRQ descriptors that will be about twice as fast. (with a 
large iteration scope this function is cachemiss-limited and 
doing this passes doubles the cachemiss rate.)

> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + *	resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
> + *
> + *	Enable all interrupt lines previously disabled by suspend_device_irqs()
> + *	that have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> +	struct irq_desc *desc;
> +	int irq;
> +
> +	for_each_irq_desc(irq, desc) {
> +		if (!(desc->status & IRQ_SUSPENDED))
> +			continue;
> +		desc->status &= ~IRQ_SUSPENDED;
> +		enable_irq(irq);
> +	}

Robustness+optimization nit: this will work but could be done in 
a nicer way: enable_irq() should auto-clear IRQ_SUSPENDED. (We 
already clear flags there so it's even a tiny bit faster this 
way.)

We definitely dont want IRQ_SUSPENDED to 'leak' out into an 
enabled line, should something call enable_irq() on a suspended 
line. So either make it auto-unsuspend in enable_irq(), or add 
an extra WARN_ON() to enable_irq(), to make sure IRQ_SUSPENDED 
is always off by that time.

> +     arch_suspend_disable_irqs();
> +     BUG_ON(!irqs_disabled());

Please. We just disabled all devices - a BUG_ON() is a very 
counter-productive thing to do here - chances are the user will 
never see anything but a hang. So please turn this into a nice 
WARN_ONCE().

> --- linux-2.6.orig/include/linux/interrupt.h
> +++ linux-2.6/include/linux/interrupt.h
> @@ -470,4 +470,7 @@ extern int early_irq_init(void);
>  extern int arch_early_irq_init(void);
>  extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
>  
> +extern void suspend_device_irqs(void);
> +extern void resume_device_irqs(void);

Header cleanliness nit: please dont just throw new prototypes to 
the tail of headers, but think about where they fit in best, 
logically.

These two new prototypes should go straight after the normal irq 
line state management functions:

  extern void disable_irq_nosync(unsigned int irq);
  extern void disable_irq(unsigned int irq);
  extern void enable_irq(unsigned int irq);

Perhaps also with a comment like this:

/*
 * Note: dont use these functions in driver code - they are for 
 * core kernel use only.
 */

> +++ linux-2.6/kernel/power/main.c
[...]
> +
> + Unlock:
> +	resume_device_irqs();

Small drive-by style nit: while at it could you please fix the 
capitalization and the naming of the label (and all labels in 
this file)? The standard label is "out_unlock". [and 
"err_unlock" for failure cases - but this isnt a failure case.]

There's 43 such bad label names in kernel/power/*.c, see the 
output of:

  git grep '^ [A-Z][a-z].*:$' kernel/power/

> Index: linux-2.6/arch/x86/kernel/apm_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c

> +
> +	suspend_device_irqs();
>  	device_power_down(PMSG_SUSPEND);
> +
> +	local_irq_disable();

hm, this is a very repetitive pattern, all around the various 
suspend/resume variants. Might make sense to make:

  	device_power_down(PMSG_SUSPEND);

do the irq line disabling plus the local irq disabling 
automatically. That also means it cannot be forgotten. The 
symmetric action should happen for PMSG_RESUME.

Is there ever a case where we want a different pattern?

> Index: linux-2.6/drivers/xen/manage.c
> ===================================================================
> --- linux-2.6.orig/drivers/xen/manage.c
> +++ linux-2.6/drivers/xen/manage.c
> @@ -39,12 +39,6 @@ static int xen_suspend(void *data)

> -	if (!*cancelled) {
> -		xen_irq_resume();
> -		xen_console_resume();
> -		xen_timer_resume();

This change needs a second look. xen_suspend() is a 
stop_machine() handler and as such executes on specific CPUs, 
and your change modifies this. OTOH, i had a look at these 
handlers and it all looks safe. Jeremy?

> +resume_devices:
> +	resume_device_irqs();

Small style nit: labels should start with a space character. 
I.e. it should be:

> + resume_devices:
> +	resume_device_irqs();

> +++ linux-2.6/kernel/kexec.c
> @@ -1454,7 +1454,7 @@ int kernel_kexec(void)
>  		if (error)
>  			goto Resume_devices;
>  		device_pm_lock();
> -		local_irq_disable();
> +		suspend_device_irqs();
>  		/* At this point, device_suspend() has been called,
>  		 * but *not* device_power_down(). We *must*
>  		 * device_power_down() now.  Otherwise, drivers for
> @@ -1464,8 +1464,9 @@ int kernel_kexec(void)
>  		 */
>  		error = device_power_down(PMSG_FREEZE);
>  		if (error)
> -			goto Enable_irqs;
> +			goto Resume_irqs;
>  
> +		local_irq_disable();
>  		/* Suspend system devices */
>  		error = sysdev_suspend(PMSG_FREEZE);
>  		if (error)
> @@ -1484,9 +1485,10 @@ int kernel_kexec(void)
>  	if (kexec_image->preserve_context) {
>  		sysdev_resume();
>   Power_up_devices:
> -		device_power_up(PMSG_RESTORE);
> - Enable_irqs:
>  		local_irq_enable();
> +		device_power_up(PMSG_RESTORE);
> + Resume_irqs:
> +		resume_device_irqs();
>  		device_pm_unlock();
>  		enable_nonboot_cpus();
>   Resume_devices:

(same comment about label style applies here too.)

> Index: linux-2.6/include/linux/irq.h
> ===================================================================
> --- linux-2.6.orig/include/linux/irq.h
> +++ linux-2.6/include/linux/irq.h
> @@ -65,6 +65,7 @@ typedef	void (*irq_flow_handler_t)(unsig
>  #define IRQ_SPURIOUS_DISABLED	0x00800000	/* IRQ was disabled by the spurious trap */
>  #define IRQ_MOVE_PCNTXT		0x01000000	/* IRQ migration from process context */
>  #define IRQ_AFFINITY_SET	0x02000000	/* IRQ affinity was set from userspace*/
> +#define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
>  
>  #ifdef CONFIG_IRQ_PER_CPU
>  # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

Note, you should probably make PM_SLEEP depend on 
GENERIC_HARDIRQS - as this change will break the build on all 
non-genirq architectures. (sparc, alpha, etc.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ