lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220419170649.1022246-1-ira.weiny@intel.com>
Date:   Tue, 19 Apr 2022 10:06:05 -0700
From:   ira.weiny@...el.com
To:     Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Dan Williams <dan.j.williams@...el.com>
Cc:     Ira Weiny <ira.weiny@...el.com>, Fenghua Yu <fenghua.yu@...el.com>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        linux-kernel@...r.kernel.org
Subject: [PATCH V10 00/44] PKS/PMEM: Add Stray Write Protection

From: Ira Weiny <ira.weiny@...el.com>

I'm looking for Intel acks on the series prior to submitting to maintainers.

Because I did not get a lot of feedback on the previous version I've reworked
the order of the patches to lighten the review load.

I'd like to get comments from Peter and Dave on patches 1-24.

Patches 25-36 implement the PMEM use case.  I'd like to get Dan to look at those.

Patches 37-44 implement the PKS tests which can be deferred if review time is
short.

Code wise there were no significant changes between v9 and v10.  But a V10 was
required due to upstream changes/conflicts.  One of which require dropping a
patch because a different fix landed.

This series is now based on 5.18-rc3.


Changes for V10
	Rebased to 5.18-rc3
	Re-aranged the patch series into 3 sections
		1-24 PKS core
		25-36 PMEM use case
		37-44 PKS core testing
	Drop the irqentry_exit_cond_resched() fixup patch as that was fixed by
		Mark Rutland in:
		4624a14f4daa ("sched/preempt: Simplify irqentry_exit_cond_resched() callers") 
	Adjust irqentry_exit_cond_resched() changes based on Marks fix
	Fix test_pks cpu option processing
	Move memremap code to memremap.h



PKS/PMEM Stray write protection
===============================

This series is broken into 2 parts.

	1) Introduce Protection Key Supervisor (PKS), testing, and
	   documentation
	2) Use PKS to protect PMEM from stray writes

Introduce Protection Key Supervisor (PKS) [Patches 1-24]
--------------------------------------------------------

PKS enables protections on 'domains' of supervisor pages to limit supervisor
mode access to pages beyond the normal paging protections.  PKS works in a
similar fashion to user space pkeys, PKU.  As with PKU, supervisor pkeys are
checked in addition to normal paging protections.  And page mappings are
assigned to a domain by setting a 4 bit pkey in the PTE of that mapping.

Unlike PKU, permissions are changed via a MSR update.  This update avoids TLB
flushes making this an efficient way to alter protections vs PTE updates.

Also, unlike PTE updates PKS permission changes apply only to the current
processor.  Therefore changing permissions apply only to that thread and not
any other cpu/process.  This allows protections to remain in place on other
cpus for additional protection and isolation.

Even though PKS updates are thread local, XSAVE is not supported for the PKRS
MSR.  Therefore this implementation saves and restores the MSR across context
switches and during exceptions within software.  Nested exceptions are
supported by each exception getting a new PKS state.

For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections because PTEs naturally have a pkey value of 0.

Other keys, (1-15) are statically allocated by kernel consumers when
configured.  This is done by adding the appropriate PKS_NEW_KEY and
PKS_DECLARE_INIT_VALUE macros to pks-keys.h.

Two PKS consumers, PKS_TEST and PMEM stray write protection, are included in
this series.  When the number of users grows larger the sharing of keys will
need to be resolved depending on the needs of the users at that time.  Many
methods have been contemplated but the number of kernel users and use cases
envisioned is still quite small, much less than the 15 available keys.

To summarize, the following are key attributes of PKS.

	1) Fast switching of permissions
		1a) Prevents access without page table manipulations
		1b) No TLB flushes required
	2) Works on a per thread basis, thus allowing protections to be
	   preserved on threads which are not actively accessing data through
	   the mapping.

PKS is available with 4 and 5 level paging.  For this and simplicity of
implementation, the feature is restricted to x86_64.


Use PKS to protect PMEM from stray writes [Patches 25-36]
---------------------------------------------------------

DAX leverages the direct-map to enable 'struct page' services for PMEM.  Given
that PMEM capacity may be an order of magnitude higher capacity than System RAM
it presents a large vulnerability surface to stray writes.  Such a stray write
becomes a silent data corruption bug.

Stray pointers to System RAM may result in a crash or other undesirable
behavior which, while unfortunate, are usually recoverable with a reboot.
Stray writes to PMEM are permanent in nature and thus are more likely to result
in permanent user data loss.  Given that PMEM access from the kernel is limited
to a constrained set of locations (PMEM driver, Filesystem-DAX, direct-I/O, and
any properly kmap'ed page), it is amenable to PKS protection.

Set up an infrastructure for extra device access protection. Then implement the
protection using the new Protection Keys Supervisor (PKS) on architectures
which support it.

Because PMEM pages are all associated with a struct dev_pagemap and flags in
struct page are valuable the flag of protecting memory can be stored in struct
dev_pagemap.  All PMEM is protected by the same pkey.  So a single flag is all
that is needed in each dev_pagemap to indicate protection.

General access in the kernel is supported by modifying the kmap infrastructure
which can detect if a page is pks protected and enable access until the
corresponding unmap is called.

Because PKS is a thread local mechanism and because kmap was never really
intended to create a long term mapping, this implementation does not support
the kmap()/kunmap() calls.  Calling kmap() on a PMEM protected page is allowed
but accessing that mapping will cause a fault.

Originally this series modified many of the kmap call sites to indicate they
were thread local.[1]  And an attempt to support kmap()[2] was made.  But now
that kmap_local_page() has been developed[3] and in more wide spread use,
kmap() can safely be left unsupported.

How the fault is handled is configurable via a new module parameter
memremap.pks_fault_mode.  Two modes are supported.

	'relaxed' (default) -- WARN_ONCE, disable the protection and allow
	                       access

	'strict' -- prevent any unguarded access to a protected dev_pagemap
		    range

This 'safety valve' feature has already been useful in the development of this
feature.


[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/

[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/

[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/
    https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/


----------------------------------------------------------------------------
Changes for V9

Review and update all commit messages.
Update cover letter below

PKS Core
	Separate user and supervisor pkey code in the headers
		create linux/pks.h for supervisor calls
		This facilitated making the pmem code more efficient 
	Completely rearchitect the test code
		[After Dave Hansen and Rick Edgecombe found issues in the test
			code it was easier to rearchitect the code completely
			rather than attempt to fix it.]
		Remove pks_test_callback in favor of using fault hooks
			Fault hooks also isolate the fault callbacks from being
			false positives if non-test consumers are running
		Make additional PKS_TEST_RUN_ALL Kconfig option which is
			mutually exclusive to any non-test PKS consumer
			PKS_TEST_RUN_ALL takes over all pkey callbacks
		Ensure that each test runs within it's own context and is
			mutually exclusive from running while any other test is
			running.
		Ensure test session and context memory is cleaned up on file
			close
		Use pr_debug() and dynamic debug for in kernel debug messages
		Enhance test_pks selftest
			Add the ability to run all tests not just the context
				switch test
			Standardize output [PASS][FAIL][SKIP]
			Add '-d' option enables dynamic debug to see the kernel
				debug messages

	Incorporate feedback from Rick Edgecombe
		Update all pkey types to u8
		Fix up test code barriers
	Move patch declaring PKS_INIT_VALUE ahead of the patch which enables
		PKS so that PKS_INIT_VALUE can be used when pks_setup() is
		first created
	From Dan Williams
		Use macros instead of an enum for a pkey allocation scheme
			which is predicated on the config options of consumers
			This almost worked perfectly.  It required a bit of
			tweeking to be able to allocate all of the keys.

	From Dave Hansen
		Reposition some code to be near/similar to user pkeys
			s/pks_write_current/x86_pkrs_load
			s/pks_saved_pkrs/pkrs
		Update Documentation
		s/PKR_{RW,AD,WD}_KEY/PKR_{RW,AD,WD}_MASK
		Consistently use lower case for pkey
		Update commit messages
		Add Acks

PMEM Stray Write
	Building on the change to the pks_mk_*() function rename
		s/pgmap_mk_*/pgmap_set_*/
		s/dax_mk_*/dax_set_*/
	From Dan Williams
		Avoid adding new dax operations by teaching dax_device about pgmap
		Remove pgmap_protection_flag_invalid() patch (Just let
			kmap'ings fail)

Changes for V8

Feedback from Thomas
	* clean up noinstr mess
	* Fix static PKEY allocation mess
	* Ensure all functions are consistently named.
	* Split up patches to do 1 thing per patch
	* pkey_update_pkval() implementation
	* Streamline the use of pks_write_pkrs() by not disabling preemption
		- Leave this to the callers who require it.
		- Use documentation and lockdep to prevent errors
	* Clean up commit messages to explain in detail _why_ each patch is
		there.

Feedback from Dave H.
	* Leave out pks_mk_readonly() as it is not used by the PMEM use case

Feedback from Peter Anvin
	* Replace pks_abandon_pkey() with pks_update_exception()
		This is an even greater simplification in that it no longer
		attempts to shield users from faults.  As the main use case for
		abandoning a key was to allow a system to continue running even
		with an error.  This should be a rare event so the performance
		should not be an issue.

* Simplify ARCH_ENABLE_SUPERVISOR_PKEYS

* Update PKS Test code
	- Add default value test
	- Split up the test code into patches which follow each feature
	  addition
	- simplify test code processing
	- ensure consistent reporting of errors.

* Ensure all entry points to the PKS code are protected by
	cpu_feature_enabled(X86_FEATURE_PKS)
	- At the same time make sure non-entry points or sub-functions to the
	  PKS code are not _unnecessarily_ protected by the feature check

* Update documentation
	- Use kernel docs to place the docs with the code for easier internal
	  developer use

* Adjust the PMEM use cases for the core changes

* Split the PMEM patches up to be 1 change per patch and help clarify review

* Review all header files and remove those no longer needed

* Review/update/clarify all commit messages

Fenghua Yu (1):
mm/pkeys: Define PKS page table macros

Ira Weiny (42):
Documentation/protection-keys: Clean up documentation for User Space
pkeys
x86/pkeys: Clarify PKRU_AD_KEY macro
x86/pkeys: Make PKRU macros generic
x86/fpu: Refactor arch_set_user_pkey_access()
mm/pkeys: Add Kconfig options for PKS
x86/pkeys: Add PKS CPU feature bit
x86/fault: Adjust WARN_ON for pkey fault
Documentation/pkeys: Add initial PKS documentation
mm/pkeys: Provide for PKS key allocation
x86/pkeys: Enable PKS on cpus which support it
x86/pkeys: Introduce pks_write_pkrs()
x86/pkeys: Preserve the PKS MSR on context switch
mm/pkeys: Introduce pks_set_readwrite()
mm/pkeys: Introduce pks_set_noaccess()
x86/entry: Add auxiliary pt_regs space
entry: Pass pt_regs to irqentry_exit_cond_resched()
entry: Add calls for save/restore auxiliary pt_regs
x86/entry: Define arch_{save|restore}_auxiliary_pt_regs()
x86/pkeys: Preserve PKRS MSR across exceptions
x86/fault: Print PKS MSR on fault
mm/pkeys: Introduce pks_update_exception()
mm/pkeys: Add pks_available()
memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION
memremap_pages: Introduce pgmap_protection_available()
memremap_pages: Introduce a PGMAP_PROTECTION flag
memremap_pages: Introduce devmap_protected()
memremap_pages: Reserve a PKS pkey for eventual use by PMEM
memremap_pages: Set PKS pkey in PTEs if requested
memremap_pages: Define pgmap_set_{readwrite|noaccess}() calls
memremap_pages: Add memremap.pks_fault_mode
kmap: Make kmap work for devmap protected pages
dax: Stray access protection for dax_direct_access()
nvdimm/pmem: Enable stray access protection
devdax: Enable stray access protection
mm/pkeys: PKS testing, add initial test code
x86/selftests: Add test_pks
mm/pkeys: PKS testing, add a fault call back
mm/pkeys: PKS testing, add pks_set_*() tests
mm/pkeys: PKS testing, test context switching
mm/pkeys: PKS testing, Add exception test
mm/pkeys: PKS testing, test pks_update_exception()
mm/pkeys: PKS testing, add test for all keys

Rick Edgecombe (1):
mm/pkeys: Introduce PKS fault callbacks

.../admin-guide/kernel-parameters.txt | 12 +
Documentation/core-api/protection-keys.rst | 130 ++-
arch/arm64/include/asm/preempt.h | 2 +-
arch/arm64/kernel/entry-common.c | 4 +-
arch/x86/Kconfig | 6 +
arch/x86/entry/calling.h | 20 +
arch/x86/entry/common.c | 2 +-
arch/x86/entry/entry_64.S | 22 +
arch/x86/entry/entry_64_compat.S | 6 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/entry-common.h | 15 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable_types.h | 22 +
arch/x86/include/asm/pkeys.h | 2 +
arch/x86/include/asm/pkeys_common.h | 18 +
arch/x86/include/asm/pkru.h | 20 +-
arch/x86/include/asm/pks.h | 46 ++
arch/x86/include/asm/processor.h | 15 +-
arch/x86/include/asm/ptrace.h | 21 +
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/asm-offsets_64.c | 15 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/dumpstack.c | 32 +-
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 6 +
arch/x86/kernel/process_64.c | 3 +
arch/x86/mm/fault.c | 17 +-
arch/x86/mm/pkeys.c | 320 +++++++-
drivers/dax/device.c | 2 +
drivers/dax/super.c | 60 ++
drivers/md/dm-writecache.c | 8 +-
drivers/nvdimm/pmem.c | 26 +
fs/dax.c | 8 +
fs/fuse/virtio_fs.c | 2 +
include/linux/dax.h | 5 +
include/linux/entry-common.h | 24 +-
include/linux/highmem-internal.h | 6 +
include/linux/memremap.h | 73 ++
include/linux/pgtable.h | 4 +
include/linux/pks-keys.h | 93 +++
include/linux/pks.h | 73 ++
include/linux/sched.h | 7 +
include/uapi/asm-generic/mman-common.h | 1 +
init/init_task.c | 3 +
kernel/entry/common.c | 29 +-
kernel/sched/core.c | 40 +-
lib/Kconfig.debug | 33 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 755 ++++++++++++++++++
mm/Kconfig | 32 +
mm/memremap.c | 132 +++
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/test_pks.c | 514 ++++++++++++
55 files changed, 2618 insertions(+), 112 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 include/linux/pks-keys.h
create mode 100644 include/linux/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c


base-commit: b2d229d4ddb17db541098b83524d901257e93845
prerequisite-patch-id: a73f5ec8b3ecec9c95724106ccb5999c4f955b89
--
2.35.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ