lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h8qk3htc.fsf@xmission.com>
Date:   Tue, 13 Feb 2018 11:44:47 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Baoquan He <bhe@...hat.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org, tglx@...utronix.de,
        x86@...nel.org, douly.fnst@...fujitsu.com, joro@...tes.org,
        uobergfe@...hat.com, prarit@...hat.com
Subject: Re: [PATCH v3 2/5] x86/apic: Fix restoring boot irq mode in reboot and kexec/kdump

Baoquan He <bhe@...hat.com> writes:

> Hi Eric,
>
> On 02/11/18 at 09:08pm, Eric W. Biederman wrote:
>> Baoquan He <bhe@...hat.com> writes:
>> 
>> > This is a regression fix.
>> >
>> > Before, to fix erratum AVR31, commit 522e66464467 ("x86/apic: Disable
>> > I/O APIC before shutdown of the local APIC") moved lapic_shutdown()
>> > calling after disable_IO_APIC(). This introdued a regression. The
>> > root cause is that disable_IO_APIC() not only clears IO_APIC, also
>> > restore boot irq mode by setting LAPIC/APIC/IMCR, lapic_shutdown()
>> > after disable_IO_APIC() will disable LAPIC and ruin the possible
>> > virtual wire mode setting which the code has been trying to do all
>> > along.
>> > To fix this, just break down disable_IO_APIC(), then call
>> > clear_IO_APIC() to stop IO_APIC where disable_IO_APIC() was called,
>> > and call restore_boot_irq_mode() to restore boot irq mode before
>> > reboot or kexec/kdump jump.
>> 
>> Two things here.
>> a) This is missing a fixes tag and a CC stable.
>> b) What makes your change to the KEXEC_JUMP code path safe?
>>    Have the lapic and ioapic already been shut down?
>> 
>> The KEXEC_JUMP changes to machine_kexec_32.c and machine_kexec_64.c
>> either need to be documented in the change long why they are safe
>> so that this change becomes obviously safe and correct.
>
> Re-read the code, I have to admit I didn't check the KEXEC_JUMP code
> path carefully. 
>
> kernel_kexec() {
> 	if (kexec_image->preserve_context) {
> 		...
> 		freeze_processes();
> 		...
> 		disable_nonboot_cpus();
> 		...
> 		
> 	else {
> 		...
> 		machine_shutdown();
> 		...
> 	}
> 	machine_kexec(kexec_image);
> 	...
> }
>
>   --machine_shutdown()
>     --native_machine_shutdown()
>       --disable_IO_APIC()
>       --lapic_shutdown()
>
> machine_kexec() {
> 	...
> 	if (image->preserve_context) {
> 		disable_IO_APIC();
> 	}
> 	...
> }
>
> KEXEC_JUMP code path is different than kexec/kdump, it doesn't call
> lapic_shutdown() before jump. So commit 522e66464467
> ("x86/apic: Disable I/O APIC before shutdown of the local APIC") didn't
> impact it. And here I break down disable_IO_APIC() and change to only
> call restore_boot_irq_mode() to make a possible danger. I am not an
> expert on KEXEC_JUMP, and don't know how to test it, so will keep the
> code implementation consistent as before. For now, I plan to change it
> as below if you don't object. As you pointed out, I will describe this
> in patch log. 
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 1f790cf9d38f..cb0c2d0a4c99 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -297,7 +297,7 @@ void machine_kexec(struct kimage *image)
>                  * one form or other. kexec jump path also need
>                  * one.
>                  */
> -               disable_IO_APIC();
> +		clear_IO_APIC();
> +               restore_boot_irq_mode();
>  #endif
>         }
>
>

Let me give a very concrete suggestion:
Patch 1) Replace "disable_IO_APIC();" with "clear_IO_APIC(); restore_boot_irq_mode();"
Patch 2) Move restore_boot_irq_mode(); to fix the regression.

I think that will be a slightly shorter patch sequence than what you are
dealing with and one that is slightly easier to read.

We need to sort out KEXEC_JUMP but that is something for another time.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ