linux-kernel - Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered inactive device IRQ interrruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090408233758.GB14412@us.ibm.com>
Date:	Wed, 8 Apr 2009 16:37:58 -0700
From:	Gary Hade <garyhade@...ibm.com>
To:	Yinghai Lu <yhlu.kernel@...il.com>
Cc:	Gary Hade <garyhade@...ibm.com>, mingo@...e.hu, mingo@...hat.com,
	tglx@...utronix.de, hpa@...or.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, lcm@...ibm.com
Subject: Re: [PATCH 2/3] [BUGFIX] x86/x86_64: fix CPU offlining triggered
	inactive device IRQ interrruption

On Wed, Apr 08, 2009 at 03:30:15PM -0700, Yinghai Lu wrote:
> On Wed, Apr 8, 2009 at 2:07 PM, Gary Hade <garyhade@...ibm.com> wrote:
> > Impact: Eliminates a race that can leave the system in an
> >        unusable state
> >
> > During rapid offlining of multiple CPUs there is a chance
> > that an IRQ affinity move destination CPU will be offlined
> > before the IRQ affinity move initiated during the offlining
> > of a previous CPU completes.  This can happen when the device
> > is not very active and thus fails to generate the IRQ that is
> > needed to complete the IRQ affinity move before the move
> > destination CPU is offlined.  When this happens there is an
> > -EBUSY return from __assign_irq_vector() during the offlining
> > of the IRQ move destination CPU which prevents initiation of
> > a new IRQ affinity move operation to an online CPU.  This
> > leaves the IRQ affinity set to an offlined CPU.
> >
> > I have been able to reproduce the problem on some of our
> > systems using the following script.  When the system is idle
> > the problem often reproduces during the first CPU offlining
> > sequence.
> >
> > #!/bin/sh
> >
> > SYS_CPU_DIR=/sys/devices/system/cpu
> > VICTIM_IRQ=25
> > IRQ_MASK=f0
> >
> > iteration=0
> > while true; do
> >  echo $iteration
> >  echo $IRQ_MASK > /proc/irq/$VICTIM_IRQ/smp_affinity
> >  for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
> >    echo 0 > $cpudir/online
> >  done
> >  for cpudir in $SYS_CPU_DIR/cpu[1-9] $SYS_CPU_DIR/cpu??; do
> >    echo 1 > $cpudir/online
> >  done
> >  iteration=`expr $iteration + 1`
> > done
> >
> > The proposed fix takes advantage of the fact that when all
> > CPUs in the old domain are offline there is nothing to be done
> > by send_cleanup_vector() during the affinity move completion.
> > So, we simply avoid setting cfg->move_in_progress preventing
> > the above mentioned -EBUSY return from __assign_irq_vector().
> > This allows initiation of a new IRQ affinity move to a CPU
> > that is not going offline.
> >
> > Signed-off-by: Gary Hade <garyhade@...ibm.com>
> >
> > ---
> >  arch/x86/kernel/apic/io_apic.c |   11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> >
> > Index: linux-2.6.30-rc1/arch/x86/kernel/apic/io_apic.c
> > ===================================================================
> > --- linux-2.6.30-rc1.orig/arch/x86/kernel/apic/io_apic.c        2009-04-08 09:23:00.000000000 -0700
> > +++ linux-2.6.30-rc1/arch/x86/kernel/apic/io_apic.c     2009-04-08 09:23:16.000000000 -0700
> > @@ -363,7 +363,8 @@ set_extra_move_desc(struct irq_desc *des
> >        struct irq_cfg *cfg = desc->chip_data;
> >
> >        if (!cfg->move_in_progress) {
> > -               /* it means that domain is not changed */
> > +               /* it means that domain has not changed or all CPUs
> > +                * in old domain are offline */
> >                if (!cpumask_intersects(desc->affinity, mask))
> >                        cfg->move_desc_pending = 1;
> >        }
> > @@ -1262,8 +1263,11 @@ next:
> >                current_vector = vector;
> >                current_offset = offset;
> >                if (old_vector) {
> > -                       cfg->move_in_progress = 1;
> >                        cpumask_copy(cfg->old_domain, cfg->domain);
> > +                       if (cpumask_intersects(cfg->old_domain,
> > +                                              cpu_online_mask)) {
> > +                               cfg->move_in_progress = 1;
> > +                       }
> >                }
> >                for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
> >                        per_cpu(vector_irq, new_cpu)[vector] = irq;
> > @@ -2492,7 +2496,8 @@ static void irq_complete_move(struct irq
> >                if (likely(!cfg->move_desc_pending))
> >                        return;
> >
> > -               /* domain has not changed, but affinity did */
> > +               /* domain has not changed or all CPUs in old domain
> > +                * are offline, but affinity changed */
> >                me = smp_processor_id();
> >                if (cpumask_test_cpu(me, desc->affinity)) {
> >                        *descp = desc = move_irq_desc(desc, me);
> > --
> 
> so you mean during __assign_irq_vector(), cpu_online_mask get updated?

No, the CPU being offlined is removed from cpu_online_mask 
earlier via a call to remove_cpu_from_maps() from
cpu_disable_common().  This happens just before fixup_irqs()
is called.

> with your patch, how about that it just happen right after you check
> that second time.
> 
> it seems we are missing some lock_vector_lock() on the remove cpu from
> online mask.

The remove_cpu_from_maps() call in cpu_disable_common() is vector
lock protected:
void cpu_disable_common(void)
{
               < snip >
	/* It's now safe to remove this processor from the online map */
	lock_vector_lock();
	remove_cpu_from_maps(cpu);
	unlock_vector_lock();
	fixup_irqs();
}

Is this what you meant?

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
garyhade@...ibm.com
http://www.ibm.com/linux/ltc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/