linux-kernel - Re: [PATCH] panic, kexec: Don't mutex_trylock() in __crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhh74j7biy.mognet@vschneid.remote.csb>
Date:   Fri, 17 Jun 2022 12:52:05 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Tao Zhou <tao.zhou@...ux.dev>
Cc:     linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
        linux-rt-users@...r.kernel.org,
        Eric Biederman <ebiederm@...ssion.com>,
        Arnd Bergmann <arnd@...db.de>, Petr Mladek <pmladek@...e.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Juri Lelli <jlelli@...hat.com>,
        "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>,
        Tao Zhou <tao.zhou@...ux.dev>
Subject: Re: [PATCH] panic, kexec: Don't mutex_trylock() in __crash_kexec()

Hi Tao,

On 17/06/22 18:42, Tao Zhou wrote:
> Hi Valentin,
>
> On Thu, Jun 16, 2022 at 01:37:09PM +0100, Valentin Schneider wrote:
>> @@ -964,24 +966,31 @@ late_initcall(kexec_core_sysctl_init);
>>   */
>>  void __noclone __crash_kexec(struct pt_regs *regs)
>>  {
>> -	/* Take the kexec_mutex here to prevent sys_kexec_load
>> -	 * running on one cpu from replacing the crash kernel
>> -	 * we are using after a panic on a different cpu.
>> +	/*
>> +	 * This should be taking kexec_mutex before doing anything with the
>> +	 * kexec_crash_image, but this code can be run in NMI context which
>> +	 * means we can't even trylock.
>>  	 *
>> -	 * If the crash kernel was not located in a fixed area
>> -	 * of memory the xchg(&kexec_crash_image) would be
>> -	 * sufficient.  But since I reuse the memory...
>> +	 * Pairs with smp_mb() in do_kexec_load() and sys_kexec_file_load()
>>  	 */
>> -	if (mutex_trylock(&kexec_mutex)) {
>> -		if (kexec_crash_image) {
>> -			struct pt_regs fixed_regs;
>> -
>> -			crash_setup_regs(&fixed_regs, regs);
>> -			crash_save_vmcoreinfo();
>> -			machine_crash_shutdown(&fixed_regs);
>> -			machine_kexec(kexec_crash_image);
>> -		}
>> -		mutex_unlock(&kexec_mutex);
>> +	WRITE_ONCE(panic_wants_kexec, true);
>> +	smp_mb();
>> +	/*
>> +	 * If we're panic'ing while someone else is messing with the crash
>> +	 * kernel, this isn't going to end well.
>> +	 */
>> +	if (READ_ONCE(kexec_loading)) {
>> +		WRITE_ONCE(panic_wants_kexec, false);
>> +		return;
>> +	}
>
> So this is from NMI. The mutex guarantee that kexec_file_load() or 
> do_kexec_load() just one of them beat on cpu. NMI can happen on more
> than one cpu. That means that here be cumulativity here IMHO.
>

If you look at __crash_kexec() in isolation yes, but if you look at panic()
and nmi_panic() only a single NMI can get in there. I think that is also
true for invocations via crash_kexec().


> kexec_file_load()/                 NMI0                     NMI1..
> do_kexec_load()
>
> set kexec_loading=true     
> smp_mb()                set panic_wants_kexec=ture
>                         smp_mb()
>                         see kexec_loading=ture and
>                           conditionally set
>                           panic_wants_kexec=false;
>                                                  set panic_wants_kexec=ture
>                                                  smp_mb()
> see panic_wants_kexec=ture
>   conditionally set
>   kexec_loading=false
>                                                  see kexec_loading=false
>                                                  do kexec nmi things.
>
> You see conditionlly set kexec_loading or panic_wants_kexec there no barrier
> there and if the cumulativity to have the effect there should be a acquire-release,
> if I am not wrong.
>
> __crash_kexec():
>
> WRITE_ONCE(panic_wants_kexec, true);
> smp_mb();
> /*
>  * If we're panic'ing while someone else is messing with the crash
>  * kernel, this isn't going to end well.
>  */
> if (READ_ONCE(kexec_loading)) {
> 	smp_store_release(panic_wants_kexec, false);
> 	return;
> }
>
> kexec_file_load()/do_kexec_load():
>
> WRITE_ONCE(kexec_loading, true);
> smp_mb();
> if (smp_load_acquire(panic_wants_kexec)) {
>     WRITE_ONCE(kexec_loading, false);
>     ...
> }
>
> For those input, I'm sure I lost and feel hot..
> I thought that change the patten to load-store and set initial
> value but failed.
>

I'm not sure if further ordering is required here, the base case being

  WRITE_ONCE(panic_wants_kexec, true);         WRITE_ONCE(kexec_loading);
  smp_mb();                                    smp_mb();
  v0 = READ_ONCE(kexec_loading);               v1 = READ_ONCE(panic_wants_kexec);

  (see SB+fencembonceonces litmus test)

Wich ensures (!v0 && !v1) is never true. If modified to:

  WRITE_ONCE(panic_wants_kexec, true);         WRITE_ONCE(kexec_loading);
  smp_mb();                                    smp_mb();
  v0 = READ_ONCE(kexec_loading);               v1 = READ_ONCE(panic_wants_kexec);  
  if (v0)                                      if (v1)
          WRITE_ONCE(panic_wants_kexec, false);        WRITE_ONCE(kexec_loading, false);

then "(!v0 && !v1) is never true" still holds, so the exclusivity is
maintained AFAICT.