linux-kernel - Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8f95e8b4-415f-1652-bb02-0a7c631c72ac@intel.com>
Date:   Tue, 7 Apr 2020 06:14:33 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Keno Fischer <keno@...iacomputing.com>,
        linux-kernel@...r.kernel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andi Kleen <andi@...stfloor.org>,
        Kyle Huey <khuey@...ehuey.com>,
        Robert O'Callahan <robert@...llahan.org>
Subject: Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET_XCR0 to set XCR0
 per-thread

I didn't review the first attempt at this, but this looks like a really
bad idea to me.  I don't 100% buy the arguments that this shouldn't be
done in a VM.  The x86 virtualization architecture was literally
designed to hide hardware details like this.

I can't imagine ever merging this until using VMs or at least the KVM
API has been completely ruled out.

I don't doubt that this approach works in a controlled environment.
But, I can't imagine that we could ever wrap sane semantics around such
a beast.

For instance, what would ARCH_SET_XCR0 do in a signal handler?  The
components are mostly in their init states so xstate_is_initial() will
pass.  But the kernel's XRSTOR might #GP because it might try to restore
states that are not enabled in XCR0 now.

Also, this has to be done before XCR0's state is observed by userspace,
which the kernel can't easily do once the program is off and running.
This tells me that it really needs to be done at execve() time, even
before the first program instruction runs.

These are usually the kinds of things that you figure out when you try
to go write a manpage for one of these suckers.

How does this work with things like xstateregs_[gs]et() where the format
of the kernel buffer and thus the kernel XCR0 is exposed as part of our
ABI?  With this patch, wouldn't a debugger app see a state buffer that
looks invalid?

Where else in our ABI is the format of the XSAVE buffer exposed?

I'm extra wary of a v2 that's missing the CodingStyle and changelog
basics as well.

> +static int xcr0_is_legal(unsigned long xcr0)
> +{
> +	/* Conservatively disallow anything above bit 9,
> +	 * to avoid accidentally allowing the disabling of
> +	 * new features without updating these checks
> +	 */
> +	if (xcr0 & ~((1 << 10) - 1))
> +		return 0;

Yay, magic numbers!

This would be much better as a BUILD_BUG_ON().

> +	if (!(xcr0 & XFEATURE_MASK_FP))
> +		return 0;
> +	if ((xcr0 & XFEATURE_MASK_YMM) && !(xcr0 & XFEATURE_MASK_SSE))
> +		return 0;
> +	if ((!(xcr0 & XFEATURE_MASK_BNDREGS)) !=
> +		(!(xcr0 & XFEATURE_MASK_BNDCSR)))
> +		return 0;
> +	if (xcr0 & XFEATURE_MASK_AVX512) {
> +		if (!(xcr0 & XFEATURE_MASK_YMM))
> +			return 0;
> +		if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512)
> +			return 0;
> +	}
> +	return 1;
> +}

This appears to copy (instead of refactoring) code from __kvm_set_xcr(),
yet manages to get the indentation different and wrong.