[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YkXhIxfqjVBRTyjS@iweiny-desk3>
Date: Thu, 31 Mar 2022 10:13:07 -0700
From: Ira Weiny <ira.weiny@...el.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Dan Williams <dan.j.williams@...el.com>
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
"Shankar, Ravi V" <ravi.v.shankar@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V9 00/45] PKS/PMEM: Add Stray Write Protection
On Thu, Mar 10, 2022 at 09:19:34AM -0800, Ira wrote:
> From: Ira Weiny <ira.weiny@...el.com>
>
>
> I'm looking for Intel acks on the series prior to submitting to maintainers.
> Most of the changes from V8 to V9 was in getting the tests straightened out.
> But there are some improvements in the actual code.
Is there any feedback on this?
Ira
>
>
> Changes for V9
>
> Review and update all commit messages.
> Update cover letter below
>
> PKS Core
> Separate user and supervisor pkey code in the headers
> create linux/pks.h for supervisor calls
> This facilitated making the pmem code more efficient
> Completely rearchitect the test code
> [After Dave Hansen and Rick Edgecombe found issues in the test
> code it was easier to rearchitect the code completely
> rather than attempt to fix it.]
> Remove pks_test_callback in favor of using fault hooks
> Fault hooks also isolate the fault callbacks from being
> false positives if non-test consumers are running
> Make additional PKS_TEST_RUN_ALL Kconfig option which is
> mutually exclusive to any non-test PKS consumer
> PKS_TEST_RUN_ALL takes over all pkey callbacks
> Ensure that each test runs within it's own context and is
> mutually exclusive from running while any other test is
> running.
> Ensure test session and context memory is cleaned up on file
> close
> Use pr_debug() and dynamic debug for in kernel debug messages
> Enhance test_pks selftest
> Add the ability to run all tests not just the context
> switch test
> Standardize output [PASS][FAIL][SKIP]
> Add '-d' option enables dynamic debug to see the kernel
> debug messages
>
> Incorporate feedback from Rick Edgecombe
> Update all pkey types to u8
> Fix up test code barriers
> Move patch declaring PKS_INIT_VALUE ahead of the patch which enables
> PKS so that PKS_INIT_VALUE can be used when pks_setup() is
> first created
> From Dan Williams
> Use macros instead of an enum for a pkey allocation scheme
> which is predicated on the config options of consumers
> This almost worked perfectly. It required a bit of
> tweeking to be able to allocate all of the keys.
>
> From Dave Hansen
> Reposition some code to be near/similar to user pkeys
> s/pks_write_current/x86_pkrs_load
> s/pks_saved_pkrs/pkrs
> Update Documentation
> s/PKR_{RW,AD,WD}_KEY/PKR_{RW,AD,WD}_MASK
> Consistently use lower case for pkey
> Update commit messages
> Add Acks
>
> PMEM Stray Write
> Building on the change to the pks_mk_*() function rename
> s/pgmap_mk_*/pgmap_set_*/
> s/dax_mk_*/dax_set_*/
> From Dan Williams
> Avoid adding new dax operations by teaching dax_device about pgmap
> Remove pgmap_protection_flag_invalid() patch (Just let
> kmap'ings fail)
>
>
> PKS/PMEM Stray write protection
> ===============================
>
> This series is broken into 2 parts.
>
> 1) Introduce Protection Key Supervisor (PKS), testing, and
> documentation
> 2) Use PKS to protect PMEM from stray writes
>
> Introduce Protection Key Supervisor (PKS)
> -----------------------------------------
>
> PKS enables protections on 'domains' of supervisor pages to limit supervisor
> mode access to pages beyond the normal paging protections. PKS works in a
> similar fashion to user space pkeys, PKU. As with PKU, supervisor pkeys are
> checked in addition to normal paging protections. And page mappings are
> assigned to a domain by setting a 4 bit pkey in the PTE of that mapping.
>
> Unlike PKU, permissions are changed via a MSR update. This update avoids TLB
> flushes making this an efficient way to alter protections vs PTE updates.
>
> Also, unlike PTE updates PKS permission changes apply only to the current
> processor. Therefore changing permissions apply only to that thread and not
> any other cpu/process. This allows protections to remain in place on other
> cpus for additional protection and isolation.
>
> Even though PKS updates are thread local, XSAVE is not supported for the PKRS
> MSR. Therefore this implementation saves and restores the MSR across context
> switches and during exceptions within software. Nested exceptions are
> supported by each exception getting a new PKS state.
>
> For consistent behavior with current paging protections, pkey 0 is reserved and
> configured to allow full access via the pkey mechanism, thus preserving the
> default paging protections because PTEs naturally have a pkey value of 0.
>
> Other keys, (1-15) are statically allocated by kernel consumers when
> configured. This is done by adding the appropriate PKS_NEW_KEY and
> PKS_DECLARE_INIT_VALUE macros to pks-keys.h.
>
> Two PKS consumers, PKS_TEST and PMEM stray write protection, are included in
> this series. When the number of users grows larger the sharing of keys will
> need to be resolved depending on the needs of the users at that time. Many
> methods have been contemplated but the number of kernel users and use cases
> envisioned is still quite small, much less than the 15 available keys.
>
> To summarize, the following are key attributes of PKS.
>
> 1) Fast switching of permissions
> 1a) Prevents access without page table manipulations
> 1b) No TLB flushes required
> 2) Works on a per thread basis, thus allowing protections to be
> preserved on threads which are not actively accessing data through
> the mapping.
>
> PKS is available with 4 and 5 level paging. For this and simplicity of
> implementation, the feature is restricted to x86_64.
>
>
> Use PKS to protect PMEM from stray writes
> -----------------------------------------
>
> DAX leverages the direct-map to enable 'struct page' services for PMEM. Given
> that PMEM capacity may be an order of magnitude higher capacity than System RAM
> it presents a large vulnerability surface to stray writes. Such a stray write
> becomes a silent data corruption bug.
>
> Stray pointers to System RAM may result in a crash or other undesirable
> behavior which, while unfortunate, are usually recoverable with a reboot.
> Stray writes to PMEM are permanent in nature and thus are more likely to result
> in permanent user data loss. Given that PMEM access from the kernel is limited
> to a constrained set of locations (PMEM driver, Filesystem-DAX, direct-I/O, and
> any properly kmap'ed page), it is amenable to PKS protection.
>
> Set up an infrastructure for extra device access protection. Then implement the
> protection using the new Protection Keys Supervisor (PKS) on architectures
> which support it.
>
> Because PMEM pages are all associated with a struct dev_pagemap and flags in
> struct page are valuable the flag of protecting memory can be stored in struct
> dev_pagemap. All PMEM is protected by the same pkey. So a single flag is all
> that is needed in each dev_pagemap to indicate protection.
>
> General access in the kernel is supported by modifying the kmap infrastructure
> which can detect if a page is pks protected and enable access until the
> corresponding unmap is called.
>
> Because PKS is a thread local mechanism and because kmap was never really
> intended to create a long term mapping, this implementation does not support
> the kmap()/kunmap() calls. Calling kmap() on a PMEM protected page is allowed
> but accessing that mapping will cause a fault.
>
> Originally this series modified many of the kmap call sites to indicate they
> were thread local.[1] And an attempt to support kmap()[2] was made. But now
> that kmap_local_page() has been developed[3] and in more wide spread use,
> kmap() can safely be left unsupported.
>
> How the fault is handled is configurable via a new module parameter
> memremap.pks_fault_mode. Two modes are supported.
>
> 'relaxed' (default) -- WARN_ONCE, disable the protection and allow
> access
>
> 'strict' -- prevent any unguarded access to a protected dev_pagemap
> range
>
> This 'safety valve' feature has already been useful in the development of this
> feature.
>
>
> [1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
>
> [2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
>
> [3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/
> https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/
> https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/
> https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
>
>
> ----------------------------------------------------------------------------
> Changes for V8
>
> Feedback from Thomas
> * clean up noinstr mess
> * Fix static PKEY allocation mess
> * Ensure all functions are consistently named.
> * Split up patches to do 1 thing per patch
> * pkey_update_pkval() implementation
> * Streamline the use of pks_write_pkrs() by not disabling preemption
> - Leave this to the callers who require it.
> - Use documentation and lockdep to prevent errors
> * Clean up commit messages to explain in detail _why_ each patch is
> there.
>
> Feedback from Dave H.
> * Leave out pks_mk_readonly() as it is not used by the PMEM use case
>
> Feedback from Peter Anvin
> * Replace pks_abandon_pkey() with pks_update_exception()
> This is an even greater simplification in that it no longer
> attempts to shield users from faults. As the main use case for
> abandoning a key was to allow a system to continue running even
> with an error. This should be a rare event so the performance
> should not be an issue.
>
> * Simplify ARCH_ENABLE_SUPERVISOR_PKEYS
>
> * Update PKS Test code
> - Add default value test
> - Split up the test code into patches which follow each feature
> addition
> - simplify test code processing
> - ensure consistent reporting of errors.
>
> * Ensure all entry points to the PKS code are protected by
> cpu_feature_enabled(X86_FEATURE_PKS)
> - At the same time make sure non-entry points or sub-functions to the
> PKS code are not _unnecessarily_ protected by the feature check
>
> * Update documentation
> - Use kernel docs to place the docs with the code for easier internal
> developer use
>
> * Adjust the PMEM use cases for the core changes
>
> * Split the PMEM patches up to be 1 change per patch and help clarify review
>
> * Review all header files and remove those no longer needed
>
> * Review/update/clarify all commit messages
>
> Fenghua Yu (1):
> mm/pkeys: Define PKS page table macros
>
> Ira Weiny (43):
> entry: Create an internal irqentry_exit_cond_resched() call
> Documentation/protection-keys: Clean up documentation for User Space
> pkeys
> x86/pkeys: Clarify PKRU_AD_KEY macro
> x86/pkeys: Make PKRU macros generic
> x86/fpu: Refactor arch_set_user_pkey_access()
> mm/pkeys: Add Kconfig options for PKS
> x86/pkeys: Add PKS CPU feature bit
> x86/fault: Adjust WARN_ON for pkey fault
> Documentation/pkeys: Add initial PKS documentation
> mm/pkeys: Provide for PKS key allocation
> x86/pkeys: Enable PKS on cpus which support it
> mm/pkeys: PKS testing, add initial test code
> x86/selftests: Add test_pks
> x86/pkeys: Introduce pks_write_pkrs()
> x86/pkeys: Preserve the PKS MSR on context switch
> mm/pkeys: Introduce pks_set_readwrite()
> mm/pkeys: Introduce pks_set_noaccess()
> mm/pkeys: PKS testing, add a fault call back
> mm/pkeys: PKS testing, add pks_set_*() tests
> mm/pkeys: PKS testing, test context switching
> x86/entry: Add auxiliary pt_regs space
> entry: Split up irqentry_exit_cond_resched()
> entry: Add calls for save/restore auxiliary pt_regs
> x86/entry: Define arch_{save|restore}_auxiliary_pt_regs()
> x86/pkeys: Preserve PKRS MSR across exceptions
> x86/fault: Print PKS MSR on fault
> mm/pkeys: PKS testing, Add exception test
> mm/pkeys: Introduce pks_update_exception()
> mm/pkeys: PKS testing, test pks_update_exception()
> mm/pkeys: PKS testing, add test for all keys
> mm/pkeys: Add pks_available()
> memremap_pages: Add Kconfig for DEVMAP_ACCESS_PROTECTION
> memremap_pages: Introduce pgmap_protection_available()
> memremap_pages: Introduce a PGMAP_PROTECTION flag
> memremap_pages: Introduce devmap_protected()
> memremap_pages: Reserve a PKS pkey for eventual use by PMEM
> memremap_pages: Set PKS pkey in PTEs if requested
> memremap_pages: Define pgmap_set_{readwrite|noaccess}() calls
> memremap_pages: Add memremap.pks_fault_mode
> kmap: Make kmap work for devmap protected pages
> dax: Stray access protection for dax_direct_access()
> nvdimm/pmem: Enable stray access protection
> devdax: Enable stray access protection
>
> Rick Edgecombe (1):
> mm/pkeys: Introduce PKS fault callbacks
>
> .../admin-guide/kernel-parameters.txt | 12 +
> Documentation/core-api/protection-keys.rst | 130 ++-
> arch/x86/Kconfig | 6 +
> arch/x86/entry/calling.h | 20 +
> arch/x86/entry/common.c | 2 +-
> arch/x86/entry/entry_64.S | 22 +
> arch/x86/entry/entry_64_compat.S | 6 +
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/disabled-features.h | 8 +-
> arch/x86/include/asm/entry-common.h | 15 +
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/include/asm/pgtable_types.h | 22 +
> arch/x86/include/asm/pkeys.h | 2 +
> arch/x86/include/asm/pkeys_common.h | 18 +
> arch/x86/include/asm/pkru.h | 20 +-
> arch/x86/include/asm/pks.h | 46 ++
> arch/x86/include/asm/processor.h | 15 +-
> arch/x86/include/asm/ptrace.h | 21 +
> arch/x86/include/uapi/asm/processor-flags.h | 2 +
> arch/x86/kernel/asm-offsets_64.c | 15 +
> arch/x86/kernel/cpu/common.c | 2 +
> arch/x86/kernel/dumpstack.c | 32 +-
> arch/x86/kernel/fpu/xstate.c | 22 +-
> arch/x86/kernel/head_64.S | 6 +
> arch/x86/kernel/process_64.c | 3 +
> arch/x86/mm/fault.c | 17 +-
> arch/x86/mm/pkeys.c | 320 +++++++-
> drivers/dax/device.c | 2 +
> drivers/dax/super.c | 59 ++
> drivers/md/dm-writecache.c | 8 +-
> drivers/nvdimm/pmem.c | 26 +
> fs/dax.c | 8 +
> fs/fuse/virtio_fs.c | 2 +
> include/linux/dax.h | 5 +
> include/linux/entry-common.h | 15 +-
> include/linux/highmem-internal.h | 4 +
> include/linux/memremap.h | 1 +
> include/linux/mm.h | 72 ++
> include/linux/pgtable.h | 4 +
> include/linux/pks-keys.h | 92 +++
> include/linux/pks.h | 73 ++
> include/linux/sched.h | 7 +
> include/uapi/asm-generic/mman-common.h | 1 +
> init/init_task.c | 3 +
> kernel/entry/common.c | 44 +-
> kernel/sched/core.c | 40 +-
> lib/Kconfig.debug | 33 +
> lib/Makefile | 3 +
> lib/pks/Makefile | 3 +
> lib/pks/pks_test.c | 755 ++++++++++++++++++
> mm/Kconfig | 32 +
> mm/memremap.c | 132 +++
> tools/testing/selftests/x86/Makefile | 2 +-
> tools/testing/selftests/x86/test_pks.c | 514 ++++++++++++
> 54 files changed, 2617 insertions(+), 109 deletions(-)
> create mode 100644 arch/x86/include/asm/pkeys_common.h
> create mode 100644 arch/x86/include/asm/pks.h
> create mode 100644 include/linux/pks-keys.h
> create mode 100644 include/linux/pks.h
> create mode 100644 lib/pks/Makefile
> create mode 100644 lib/pks/pks_test.c
> create mode 100644 tools/testing/selftests/x86/test_pks.c
>
> --
> 2.35.1
>
Powered by blists - more mailing lists