lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170703135109.GA26935@yu-desktop-1.sh.intel.com>
Date:   Mon, 3 Jul 2017 21:51:09 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>,
        Borislav Petkov <bp@...e.de>, Len Brown <len.brown@...el.com>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH][RFC] x86: Fix the irq affinity in fixup_cpus

On Sun, Jun 04, 2017 at 10:04:53PM +0200, Thomas Gleixner wrote:
> On Mon, 24 Apr 2017, Chen Yu wrote:
> 
> > fixup_cpus() is to set appropriate irq affinity once the CPU
> > has been brought down, however we should also adjust the
> > desc->irq_common_data.affinity otherwise we will get an
> > incorrect irqmask during cpu offline:
> > 
> > cat /proc/irq/31/smp_affinity
> > 00000000,80000000
> > echo 0 > /sys/devices/system/cpu/cpu31/online
> > cat /proc/irq/31/smp_affinity
> > 00000000,80000000
> > 
> > This might bring potential problems, as reported we
> > saw plenty of irq flood during hibernation restore:
> > do_IRQ: 1.51 No irq handler for vector
> > Maybe it is due to some drivers get incorrect irq mask
> > during hibernation.
> > 
> > Fix this by invoking the interface of irq_set_affinity_locked()
> > to also update the desc->irq_common_data.affinity.
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=188281
> > Reported-and-tested-by: Thomas Mitterfellner <thomas@...terfellner.at>
> > Cc: Thomas Gleixner <tglx@...utronix.de>
> > Cc: Ingo Molnar <mingo@...hat.com>
> > Cc: "H. Peter Anvin" <hpa@...or.com>
> > Cc: Borislav Petkov <bp@...e.de>
> > Cc: Len Brown <len.brown@...el.com>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > Cc: x86@...nel.org
> > Cc: linux-kernel@...r.kernel.org
> > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> > ---
> >  arch/x86/kernel/irq.c | 13 +++++++------
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> > index 4d8183b..a108ed2 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> > @@ -480,13 +480,14 @@ void fixup_irqs(void)
> >  		if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
> >  			chip->irq_mask(data);
> >  
> > -		if (chip->irq_set_affinity) {
> > -			ret = chip->irq_set_affinity(data, affinity, true);
> > -			if (ret == -ENOSPC)
> > +		ret = irq_set_affinity_locked(data, affinity, true);
> 
> This can't work. For interrupts which cannot set the affinity in normal
> context irq_set_affinity_locked() will queue the interrupt to move at the
> next arrival of an interrupt. So the irq stays affine to the dying
> CPU.
>
Ok, got it.
> After looking at the callsites, it's safe to change
> irq_set_affinity_locked() so that it uses the direct affinity setter
> function when force == true.
> 
Sorry it took me sometime to understand this point(this is why I did not reply
to you at the first time :-)
I thought the defination of the word 'safe' here means, we should
not adjust the irq affinity in the process context if the ISR is
still running, otherwise there might be a race condition.

Currently, there are four drivers would set the force flag to true(AKA,
invoking irq_force_affinity()).

1. exynos4_mct_starting_cpu()
   The irq affinity is set before the clockevent is registered,
   so there would be no interrupt triggered when adjusting
   the irq affinity in the process context. Safe.

2. sirfsoc_local_timer_starting_cpu()
   The same as above. Safe.

3. arm_perf_starting_cpu()
   During cpu offline, the pmu interrupt(non percpu pmu interrupt)
   might be migrated to other online cpus. Then once the same cpu
   is put online, the interrupt will be set back to this cpu again
   by invoking irq_force_affinity(), but currently the pmu interrupt
   might be still running on other cpus, so it would be unsafe to adjust
   its irq affinity in the process context?
   
4. sunhv_migrate_hvcons_irq()
   The cpu who encountered a panic needs to migrate the hvcons irq to the
   current alive cpu, and send ipi to stop other cpus. So at the time to
   adjust the irq affinity for the hvcons, the interrupt of the latter might
   be running and it might be unsafe to adjust the irq affinity in the
   process context?

Not sure if my understanding is correct, or do I miss something?

thanks,
Yu

> So we need that change first, before we can switch fixups_irqs() over.
> 
> Thanks,
> 
> 	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ