lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc202e90-00ff-a635-f298-c3ca293e9182@suse.com>
Date:   Mon, 12 Oct 2020 12:26:06 +0200
From:   Jürgen Groß <jgross@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     xen-devel@...ts.xenproject.org, x86@...nel.org,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>
Subject: Re: [PATCH] x86/alternative: don't call text_poke() in lazy TLB mode

On 12.10.20 12:13, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 04:42:25PM +0200, Juergen Gross wrote:
>> When running in lazy TLB mode the currently active page tables might
>> be the ones of a previous process, e.g. when running a kernel thread.
>>
>> This can be problematic in case kernel code is being modified via
>> text_poke() in a kernel thread, and on another processor exit_mmap()
>> is active for the process which was running on the first cpu before
>> the kernel thread.
>>
>> As text_poke() is using a temporary address space and the former
>> address space (obtained via cpu_tlbstate.loaded_mm) is restored
>> afterwards, there is a race possible in case the cpu on which
>> exit_mmap() is running wants to make sure there are no stale
>> references to that address space on any cpu active (this e.g. is
>> required when running as a Xen PV guest, where this problem has been
>> observed and analyzed).
>>
>> In order to avoid that, drop off TLB lazy mode before switching to the
>> temporary address space.
> 
> Oh man, that must've been 'fun' :/

Yeah.

> 
>> Fixes: cefa929c034eb5d ("x86/mm: Introduce temporary mm structs")
>> Signed-off-by: Juergen Gross <jgross@...e.com>
>> ---
>>   arch/x86/kernel/alternative.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
>> index cdaab30880b9..cd6be6f143e8 100644
>> --- a/arch/x86/kernel/alternative.c
>> +++ b/arch/x86/kernel/alternative.c
>> @@ -807,6 +807,15 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
>>   	temp_mm_state_t temp_state;
>>   
>>   	lockdep_assert_irqs_disabled();
>> +
>> +	/*
>> +	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
>> +	 * with a stale address space WITHOUT being in lazy mode after
>> +	 * restoring the previous mm.
>> +	 */
>> +	if (this_cpu_read(cpu_tlbstate.is_lazy))
>> +		leave_mm(smp_processor_id());
>> +
>>   	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>   	switch_mm_irqs_off(NULL, mm, current);
> 
> Would it make sense to write it like:
> 
> 	this_state.mm = this_cpu_read(cpu_tlbstate.is_lazy) ?
> 			&init_mm : this_cpu_read(cpu_tlbstate.loaded_mm);
> 
> Possibly with that wrapped in a conveniently named helper function.

Fine with me, but I don't think it matters that much.

For each batch of text_poke() it will be hit only once, and I'm not sure
it is really a good idea to use the knowledge that leave_mm() is just a
switch to init_mm here.

In case it is still the preferred way to do it I can send an update of
the patch.


Juergen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ