linux-kernel - PKRU issue while using alternate signal stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: 
 <SJ0PR10MB447870F586BFD2F326F55C819F572@SJ0PR10MB4478.namprd10.prod.outlook.com>
Date: Wed, 21 Feb 2024 19:54:42 +0000
From: Aruna Ramakrishna <aruna.ramakrishna@...cle.com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "x86@...nel.org" <x86@...nel.org>,
        "tglx@...utronix.de"
	<tglx@...utronix.de>,
        "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>,
        Keith Lucas <keith.lucas@...cle.com>,
        Andrew
 Brownsword <andrew.brownsword@...cle.com>,
        Dave Kleikamp
	<dave.kleikamp@...cle.com>,
        Joe Jin <joe.jin@...cle.com>
Subject: PKRU issue while using alternate signal stack

(Re-sending to the list, previous email had some formatting issues. I apologize.)

Hello,

We’re running into an issue with delayed PKRU update for signal handling, for which we don’t have a proposed solution yet.

Our use case is this:

The application has many threads that runs code that is deemed to be untrusted. Each thread has its stack/code protected by a non-zero pkey, and the PKRU register is set up such that only that particular non-zero pkey is enabled. Each thread also sets up an alternate signal stack to handle signals, which is protected by pkey zero. The pkeys man page documents that the PKRU will be reset to init_pku when the signal handler it is invoked, which means that pkey zero access is enabled, and so the alt sig stack is protected with pkey zero. But this reset happens after the kernel attempts to push fpstate to the alternate stack, which is not (yet) accessible by the kernel, which leads to a new SIGSEGV being sent to the application, terminating it.

This is the relevant snippet:

In handle_signal():

..
        failed = (setup_rt_frame(ksig, regs) < 0); <- pkru reset should happen before this
        if (!failed) {
                /*
                 * Clear the direction flag as per the ABI for function entry.
                 *
                 * Clear RF when entering the signal handler, because
                 * it might disable possible debug exception from the
                 * signal handler.
                 *
                 * Clear TF for the case when it wasn't set by debugger to
                 * avoid the recursive send_sigtrap() in SIGTRAP handler.
                 */
                regs->flags &= ~(X86_EFLAGS_DF|X86_EFLAGS_RF|X86_EFLAGS_TF);
                /*
                 * Ensure the signal handler starts with the new fpu state.
                 */
                fpu__clear_user_states(fpu); <- pkru resets here, via pkru_write_default()
        }
        signal_setup_done(failed, ksig, stepping);
..

Failure path: setup_rt_frame() -> x64_setup_rt_frame() -> get_sigframe() -> copy_fpstate_to_sigframe() -> __clear_user() -> fails, with SIGSEGV and si_code set to SEGV_PKUERR.

The PKRU value is reset to the default (enabling pkey 0 only) in fpu__clear_user_states().

If the pkru_write_default() call were to move up the flow here, before copy_fpstate_to_sigframe(), then the signal handling would work as expected. But this code/flow is quite complicated, and we’d appreciate some expert opinion.

Thanks,
Aruna