lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87im5py1ty.fsf@nanos.tec.linutronix.de>
Date:   Wed, 17 Mar 2021 21:14:49 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>, x86@...nel.org
Cc:     Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 1/2] x86/apic: Do not make an exception for PIC_CASCADE_IR when marking legacy irqs in irq_matrix

On Fri, Feb 19 2021 at 12:31, Vitaly Kuznetsov wrote:
> Trying to offline/online CPU0 seems to work only once:
>
>  # echo 0 > /sys/devices/system/cpu/cpu0/online
>  # echo 1 > /sys/devices/system/cpu/cpu0/online
>  # echo 0 > /sys/devices/system/cpu/cpu0/online
>  -bash: echo: write error: No space left on device
>
> with the following in dmesg:
>
>  [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
>
> Clearly, we went negative with cm->allocated in irq_matrix and think that
> there are too many vectors require re-assigning.
>
> The problem turns to be: lapic_assign_system_vectors() called from
> native_init_IRQ() makes an exception for PIC_CASCADE_IR and doesn't
> mark it in irq_matrix. Later, when x86_vector_alloc_irqs() called
> from setup_IO_APIC() does clear_irq_vector() for all legacy entries,
> it doesn't make an exception so we go negative.
>
> CPU0 offlining still works for the first time because some other vectors
> get assigned and the overall balance remains positive (it's off-by-one, but
> the check passes). When we online CPU0 back, no vectors get assigned and
> the overall balance remains '-1'.
>
> The simplest solution seems to be to not make an exception for
> PIC_CASCADE_IR. Nothing seems to blow up immediately.

Well no. This does not make sense. Just a few lines above the code which
you are fiddling with is:

	if (nr_legacy_irqs() > 1)
		lapic_assign_legacy_vector(PIC_CASCADE_IR, false);

Which is there for a reason because this _MUST_ stay at exactly this
place and not move randomly around.

Even without looking at the machine I can tell you what's going on. MP
config or ACPI has a pin assigned to IRQ 2 which I've not seen before.
The code there is ignoring IRQ 2 because that's how the original code
worked as well as it is reserved for the PIC_CASCADE_IRQ which should
never fire and we actually want to catch an spurious interrupt on it.

So depending on the overall configuration of that system and the
resulting delivery modes this might be ok, but I'm really nervous about
doing this wholesale as it might break old machines.

Out of paranoia I'd rather ignore that IO/APIC pin completely if it
claims to be IRQ2. I assume there is no device connected to it at all,
right?

Can you please provide a dmesg with apic=verbose on the command line?

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ