linux-kernel - Re: [PATCH 14/22] x86/fpu: Eager switch PKRU state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b0f3ffa8-faec-73b2-f6f2-ea37ece3a3b1@intel.com>
Date:   Fri, 8 Mar 2019 11:01:25 -0800
From:   Dave Hansen <dave.hansen@...el.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Andy Lutomirski <luto@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        kvm@...r.kernel.org, "Jason A. Donenfeld" <Jason@...c4.com>,
        Rik van Riel <riel@...riel.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH 14/22] x86/fpu: Eager switch PKRU state

On 3/8/19 10:08 AM, Sebastian Andrzej Siewior wrote:
> On 2019-02-25 10:16:24 [-0800], Dave Hansen wrote:
>>> +	if (!cpu_feature_enabled(X86_FEATURE_OSPKE))
>>> +		return;
>>> +
>>> +	if (current->mm) {
>>> +		pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
>>> +		WARN_ON_ONCE(!pk);
>>
>> This can trip on us of the 'init optimization' is in play because
>> get_xsave_addr() checks xsave->header.xfeatures.  That's unlikely today
>> because we usually set PKRU to a restrictive value.  But, it's also not
>> *guaranteed*.
>>
>> Userspace could easily do an XRSTOR that puts PKRU back in its init
>> state if it wanted to, then this would end up with pk==NULL.
>>
>> We might actually want a selftest that *does* that.  I don't think we
>> have one.
> 
> So you are saying that the above warning might trigger and be "okay"?

Nothing will break, but the warning will trigger, which isn't nice.

> My understanding is that the in-kernel XSAVE will always save everything
> so we should never "lose" the XFEATURE_PKRU no matter what user space
> does.
> 
> So as test case you want
> 	xsave (-1 & ~XFEATURE_PKRU)
> 	xrestore (-1 & ~XFEATURE_PKRU)
> 
> in userland and then a context switch to see if the warning above
> triggers?

I think you need an XRSTOR with RFBM=-1 (or at least with the PKRU bit
set) and the PKRU bit in the XFEATURES field in the XSAVE buffer set to 0.

>>> +		if (pk)
>>> +			pkru_val = pk->pkru;
>>> +	}> +	__write_pkru(pkru_val);
>>>  }
>>
>> A comment above __write_pkru() would be nice to say that it only
>> actually does the slow instruction on changes to the value.
> 
> Could we please not do this? It is a comment above one of the callers
> function and we have two or three. And we have that comment already
> within __write_pkru().

I looked at this code and thought "writing PKRU is slow", and "this
writes PKRU unconditionally", and "the __ version of the function
shoudn't have much logic in it".

I got 2/3 wrong.  To me that means this site needs a 1-line comment.
Feel free to move one of the other comments to here if you think it's
over-commented, but this site needs one.

>> BTW, this has the implicit behavior of always trying to do a
>> __write_pkru(0) on switches to kernel threads.  That seems a bit weird
>> and it is likely to impose WRPKRU overhead on switches between user and
>> kernel threads.
>>
>> The 0 value is also the most permissive, which is not great considering
>> that user mm's can be active the in page tables when running kernel
>> threads if we're being lazy.
>>
>> Seems like we should either leave PKRU alone or have 'init_pkru_value'
>> be the default.  That gives good security properties and is likely to
>> match the application value, removing the WRPKRU overhead.
> 
> Last time we talked about this we agreed (or this was my impression) that
> 0 should be written so that the kernel thread should always be able to
> write to user space in case it borrowed its mm (otherwise it has none
> and it would fail anyway).

We can't write to userspace when borrowing an mm.  If the kernel borrows
an mm, we might as well be on the init_mm which has no userspace mappings.

> We didn't want to leave PKRU alone because the outcome (whether or not
> the write by the kernel thread succeeds) should not depend on the last
> running task (and be random) but deterministic.

Right, so let's make it deterministically restrictive: either
init_pkru_value, or -1 since kernel threads shouldn't be touching
userspace in the first place.