lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50632767.5050607@linux.vnet.ibm.com>
Date:	Wed, 26 Sep 2012 21:33:51 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Chuansheng Liu <chuansheng.liu@...el.com>
CC:	tglx@...utronix.de, mingo@...hat.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, yanmin_zhang@...ux.intel.com,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"rusty@...tcorp.com.au" <rusty@...tcorp.com.au>
Subject: Re: [PATCH RESEND] x86/fixup_irq: Clean the offlining CPU from the
 irq affinity mask

On 09/27/2012 05:15 AM, Chuansheng Liu wrote:
> 
> When one CPU is going offline, and fixup_irqs() will re-set the
> irq affinity in some cases, we should clean the offlining CPU from
> the irq affinity.
> 
> The reason is setting offlining CPU as of the affinity is useless.
> Moreover, the smp_affinity value will be confusing when the
> offlining CPU come back again.
> 
> Example:
> For irq 93 with 4 CPUS, the default affinity f(1111),
> normal cases: 4 CPUS will receive the irq93 interrupts.
> 
> When echo 0 > /sys/devices/system/cpu/cpu3/online, just CPU0,1,2 will
> receive the interrupts.
> 
> But after the CPU3 is online again, we will not set affinity,the result
> will be:
> the smp_affinity is f, but still just CPU0,1,2 can receive the interrupts.
> 
> So we should clean the offlining CPU from irq affinity mask
> in fixup_irqs().
> 

I have some fundamental questions here:
1. Why was the CPU never removed from the affinity masks in the original
code? I find it hard to believe that it was just an oversight, because the
whole point of fixup_irqs() is to affine the interrupts to other CPUs, IIUC.
So, is that really a bug or is the existing code correct for some reason
which I don't know of?

2. In case this is indeed a bug, why are the warnings ratelimited when the
interrupts can't be affined to other CPUs? Are they not serious enough to
report? Put more strongly, why do we even silently return with a warning
instead of reporting that the CPU offline operation failed?? Is that because
we have come way too far in the hotplug sequence and we can't easily roll
back? Or are we still actually OK in that situation?

Suresh, I'd be grateful if you could kindly throw some light on these
issues... I'm actually debugging an issue where an offline CPU gets apic timer
interrupts (and in one case, I even saw a device interrupt), which I have
reported in another thread at: https://lkml.org/lkml/2012/9/26/119
But this issue in fixup_irqs() that Liu brought to light looks even more
surprising to me..

Regards,
Srivatsa S. Bhat

> ---
>  arch/x86/kernel/irq.c |   21 +++++++++++++++++----
>  1 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index d44f782..ead0807 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -239,10 +239,13 @@ void fixup_irqs(void)
>  	struct irq_desc *desc;
>  	struct irq_data *data;
>  	struct irq_chip *chip;
> +	int cpu = smp_processor_id();
> 
>  	for_each_irq_desc(irq, desc) {
>  		int break_affinity = 0;
>  		int set_affinity = 1;
> +		bool set_ret = false;
> +
>  		const struct cpumask *affinity;
> 
>  		if (!desc)
> @@ -256,7 +259,8 @@ void fixup_irqs(void)
>  		data = irq_desc_get_irq_data(desc);
>  		affinity = data->affinity;
>  		if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
> -		    cpumask_subset(affinity, cpu_online_mask)) {
> +		    cpumask_subset(affinity, cpu_online_mask) ||
> +		    !cpumask_test_cpu(cpu, data->affinity)) {
>  			raw_spin_unlock(&desc->lock);
>  			continue;
>  		}
> @@ -277,9 +281,18 @@ void fixup_irqs(void)
>  		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
>  			chip->irq_mask(data);
> 
> -		if (chip->irq_set_affinity)
> -			chip->irq_set_affinity(data, affinity, true);
> -		else if (!(warned++))
> +		if (chip->irq_set_affinity) {
> +			struct cpumask mask;
> +			cpumask_copy(&mask, affinity);
> +			cpumask_clear_cpu(cpu, &mask);
> +			switch (chip->irq_set_affinity(data, &mask, true)) {
> +			case IRQ_SET_MASK_OK:
> +				cpumask_copy(data->affinity, &mask);
> +			case IRQ_SET_MASK_OK_NOCOPY:
> +				set_ret = true;
> +			}
> +		}
> +		if ((!set_ret) && !(warned++))
>  			set_affinity = 0;
> 
>  		/*
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ