lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1507916344-3896-1-git-send-email-akrowiak@linux.vnet.ibm.com>
Date:   Fri, 13 Oct 2017 13:38:45 -0400
From:   Tony Krowiak <akrowiak@...ux.vnet.ibm.com>
To:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     freude@...ibm.com, schwidefsky@...ibm.com,
        heiko.carstens@...ibm.com, borntraeger@...ibm.com,
        cohuck@...hat.com, kwankhede@...dia.com,
        bjsdjshi@...ux.vnet.ibm.com, pbonzini@...hat.com,
        alex.williamson@...hat.com, pmorel@...ux.vnet.ibm.com,
        alifm@...ux.vnet.ibm.com, mjrosato@...ux.vnet.ibm.com,
        qemu-s390x@...gnu.org, jjherne@...ux.vnet.ibm.com,
        thuth@...hat.com, pasic@...ux.vnet.ibm.com,
        Tony Krowiak <akrowiak@...ux.vnet.ibm.com>
Subject: [RFC 00/19] KVM: s390/crypto/vfio: guest dedicated crypto adapters

Overview:
--------
An adjunct processor (AP) facility is an IBM Z cryptographic facility. The 
AP facility is comprised of three AP instructions and from 1 to 256 AP 
adapter cards. The design takes advantage of the interpretive execution mode 
provided by the SIE architecture. With interpretive execution mode, the AP 
instructions executed on the guest are interpreted by the hardware. This 
allows guests direct access to AP adapter cards. The first goal of this 
patch series is to provide direct access by a KVM guest to an AP as a 
pass-through device. The second goal is to provide administrators with the
means to configure KVM guests to grant direct access to AP facilities 
assigned to the LPAR in which the host linux system is running. 

To facilitate the comprehension of the design, let's present an overview of
the AP architecture.

AP Architectural Overview
-------------------------
Let's start with some definitions:

* AP adapter

  An AP adapter is an IBM Z adapter card that can perform cryptographic 
  functionality. There can be from 0 to 256 adapters assigned to an LPAR.
  Each adapter is identified by a number from 0 to 255.   When 
  installed, an AP is accessed by AP instructions executed by any CPU. 

* AP domain

  An adapter can be partitioned into domains. An adapter can hold up to 256 
  domains. Each domain is identified by a number from 0 to 255. Domains can 
  be further classified into two types: 
  
    * Usage domains are domains that can be accessed directly to process AP 
      commands
  
    * Control domains are domains that are accessed indirectly by AP 
      commands sent to a usage domain to control or change the domain.

* AP Queue

  An AP queue is the means by which an AP command is sent to an 
  AP usage domain inside a specific AP. An AP queue is identified by a tuple 
  comprised of an AP adapter ID and a usage domain index corresponding
  to a given usage domain within the adapter. This tuple forms an AP Queue 
  Number (APQN) uniquely identifying an AP queue. AP instructions include 
  a field containing the APQN to identify the AP queue to which the AP 
  command is targetted.

* AP Instructions:

  There are three AP instructions:

  * NQAP: to enqueue an AP command-request message to a queue
  * DQAP: to dequeue an AP command-reply message from a queue
  * PQAP: to adminster the queues

Let's now see how AP instructions are interpreted by the hardware.

Start Interpretive Execution (SIE) Instruction
---------------------------------------------- 
A KVM guest is started by executing the Start Interpretive Execution (SIE)
instruction. The SIE state description is a control block that contains the
state information for a KVM guest and is supplied as input to the SIE 
instruction. The SIE state description contains a field that references 
a Crypto Control Block (CRYCB). The CRYCB contains three bitmask fields 
identifying the adapters, usage domains and control domains assigned to the 
KVM guest: 

* The AP Mask (APM) field specifies the AP adapters assigned to the 
  KVM guest. The APM controls which adapters are valid for the KVM guest. 
  The bits in the mask, from left to right, correspond to APIDs 
  0 up to the number of adapters that can be assigned to the LPAR. If a bit 
  is set, the corresponding adapter is valid for use by the KVM guest.

* The AP Queue Mask (AQM) field specifies the AP usage domains assigned 
  to the KVM guest. The bits in the mask, from left to right, correspond
  to the usage domains, from 0 up to the number of domains that can be 
  assigned to the LPAR. If a bit is set, the corresponding usage domain is 
  valid for use by the KVM guest. 

* The AP Domain Mask field specifies the AP control domains assigned to the 
  KVM guest. The ADM bitmask controls which domains can be changed by an AP 
  command-request message sent to a usage domain from the guest. The bits in 
  the mask, from left to right, correspond to domain 0 up to the number of 
  domains that can be assigned to the LPAR. If a bit is set, the 
  corresponding domain can be modified by an AP command-request message 
  sent to a usage domain configured for the KVM guest.

If you recall from the description of an AP Queue, AP instructions include
an APQN to identify the AP adapter and the specific usage domain within 
the adapter to which an AP command-request message is to be sent (NQAP 
and PQAP instructions), or from which a command-reply message is to be 
received (DQAP instruction). The validity of an APQN is defined by the 
matrix calculated from the APM and AQM; it is the intersection of all 
assigned adapter numbers (APM) with all assigned usage domain numbers (AQM).
For example, if adapters 1 and 2 and usage domains 5 and 6 are assigned to 
a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for the 
guest. 

The APQNs provide secure key functionality - i.e., the key is stored on the 
adapter card - so when the adapter card is not virtualized - i.e., the 
adapter is accessed directly by the guest - each APQN must be assigned to 
at most one guest.

   Example 1: Valid configuration:
   ------------------------------
   Guest1: adapters 1,2  domains 5,6
   Guest2: adapter  1,2  domain 7

   This is valid because both guests have a unique set of APQNs: Guest1 has
   APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQN (1,7) and (2,7).

   Example 2: Invalid configuration:
   --------------------------------
   Guest1: adapters 1,2  domains 5,6
   Guest2: adapter  1    domains 6,7

   This is an invalid configuration because both guests have access to 
   APQNs (1,6).  

Interruption architecture:

The AP interruption architecture may or may not generate interruptions to 
signal to the CPU the end of an AP transaction. The SIE interruption 
architecture, depending upon its configuration, may or may not redirect 
AP interrupts directly to a guest if the associated queue is valid for a 
guest, and may or may not report the interruption to the host.

Effective masking for guest level I and II:

A linux host running in the LPAR operates at guest-level 1 and has its own 
SIE state description. When operating at guest-level 1, the masks from the 
host's state description are used directly. A linux guest running in the 
host operates at guest-level 2. When operating at guest-level 2, the masks 
from the guest-level 1 (host) and guest-level 2 (guest) state descriptions 
are combined into a single description called an effective mask by 
performing a logical AND of the two state descriptions. 

The effective mask algorithm is used for the APM, AQM and ADM to create 
an EAPM, EAQM and EADM respectively. Use of the EAPM, EAQM and EADM 
precludes a guest-level 1 host program from passing to a guest-level 2
program APQNs to which it does not have access.

Linux cryptographic bus driver:

Linux already has a cryptographic bus driver that provides one AP device per
AP adapter and one device per AP queue. There is a device driver for each 
type of AP adapter device and each type of AP queue device. This design 
utilizes some of the interfaces and functionality provided by the AP bus 
driver.

Design Origin:
-------------

The original design was based on modelling AP Queue devices. The design
utilized the VFIO mediated device framework whereby a mediated AP queue
device would be created for each AP Queue bound to the VFIO AP Queue device
driver. This at first seemed like the most logical design choice for the 
following reasons:

* Securing access to an AP Queue device by unbinding it from its default 
  device driver and binding it to the VFIO device driver would not preclude 
  the host from having access to the other usage domains contained within 
  the same adapter card connected to the AP queue.

* An AP command is sent to a usage domain within a specific AP adapter via 
  an AP queue.

It became readily apparent that modelling the design on an AP queue was very 
convoluted for a number of reasons:

  * There is no convenient way to notify the VFIO device driver which guest 
    will have access to a given mediated AP queue device until the mediated 
    device's file descriptor is opened by the guest. Recall that the APQNs 
    configured for the guest are an intersection of all of the bits set in 
    both the APM and AQM, so the guest's APQNs can not be validated nor 
    its SIE state description configured until all of the guest's mediated 
    AP queue device file descriptors have been opened. 

    For example, suppose a guest opens file descriptors for mediated AP 
    queue devices representing APQNs 3,5 and 4,6. If bits 3 and 4 are set in 
    the guest's APM and bits 5 and 6 are set in the guest's AQM, then APQNs 
    (3,5), (3,6), (4,5) and (4,6) will be valid for the guest, but mediated 
    AP queue devices have been created only for APQNs (3,5) and (4,6). In 
    this case, APQNs still assigned to the host would also be available to 
    the guest which is a potential security breach.

  * Control domains are not devices and are not logically modelled as 
    mediated devices. In our original design, they were modelled as 
    attributes of a mediated AP queue device, but this was a clumsy use of
    the VFIO mediated device model.

  * The SIE state description models the assignment of AP resources as a
    matrix via the APM, AQM and ADM.   
     
The design we ultimately settled upon was modelled on the AP matrix as 
defined by the SIE state description. Supplying the complete AP matrix  
to SIE using bitmasks when starting a guest simplifies the code, is far 
easier to secure, and more closely matches the model employed by SIE. This 
is the design model implemented via this patch set. 

The Design
----------
This design introduces four new objects:

1. AP matrix bus

   The sysfs location of the AP matrix bus is /sys/bus/ap_matrix. This 
   bus will create a single AP matrix device (see below).

2. AP matrix device

   The AP matrix device is a singleton that hangs off of the AP matrix bus.
   This device holds the AP Queues that have been reserved for use by 
   KVM guests. The sysfs location of the AP matrix device is 
   /sys/devices/ap_matrix/matrix. It is also linked from the AP matrix
   bus at /sys/bus/ap_matrix/devices/matrix.  

3. VFIO AP matrix driver

   This driver is based on the VFIO mediated device framework. When the 
   driver is initialized, it will:

   * Get the AP matrix device created by AP matrix bus from the bus

   * Register with the AP bus to indicate that it can control AP Queue 
     devices. This allows AP Queue devices unbound from AP device drivers
     to be bound to the VFIO AP matrix driver. The AP Queues bound to the 
     VFIO AP matrix driver will be stored by the driver in the AP matrix 
     device. 

   * Register the AP matrix device with the VFIO mediated device 
     framework (MDEV). Registration with MDEV will create the sysfs 
     structures needed to create mediated matrix devices. Each MDEV matrix
     device is used to configure the AP matrix for a KVM guest. The MDEV
     matrix device's file descriptor can be used by QEMU to communicate
     with the VFIO AP matrix device driver.

   The VFIO AP matrix driver:

   * Provides the interfaces the administrator can use to secure AP Queues 
     for use by KVM guests. This is accomplished by unbinding the AP Queues 
     needed by each KVM guest from its AP device driver and binding it to 
     the VFIO AP queue driver. This prevents the host linux system from 
     using these Queues.

   * Provides an ioctl that can be used by QEMU to configure the 
     CRYCB referenced by the KVM guest's SIE state description. The ioctl 
     will 

     * Create an EAPM, EAQM and EADM by performing a logical AND of the 
       APM, AQM and ADM configured via the MDEV matrix device's sysfs 
       attributes files (see below) with the APM, AQM and ADM of the host's 
       SIE state description respectively. 

     * Configure the SIE state description for the KVM guest using the 
       effective masks created in the previous step.

4. VFIO MDEV matrix passthrough device

   An MDEV matrix passthrough device must be created for each KVM guest that 
   will need access to AP facilities. An MDEV matrix passthrough device is
   used by QEMU to configure the APM, AQM and ADM fields of the CRYCB 
   referenced by the KVM guest's SIE state description. The file descriptor
   for the MDEV matrix passthrough device provides the communication pathway
   between QEMU and the VFIO AP matrix device driver. 

   The MDEV matrix passthrough device, like the CRYCB, contains three 
   bitmasks - an APM, AQM and ADM - for specifying the AP matrix for the 
   KVM guest. Three sets of attributes files will be provided to allow an 
   administrator to set the bits in the MDEV matrix device's APM, AQM and
   ADM: 
 
   * A file to assign an AP adapter 
   * A file to unassign an AP adapter
   * A file to display the adapters assigned

   * A file to assign an AP domain 
   * A file to unassign an AP domain
   * A file to display the domains assigned

   * A file to assign an AP control domain 
   * A file to unassign an AP control domain
   * A file to display the control domains assigned

Example:
-------
Let's now provide an example to illustrate how KVM guests may be given
access to AP facilities. For this example, we will show how to configure 
two guests such that executing the lszcrypt command on the guests would 
look like this:

Guest1
------
CARD.DOMAIN TYPE  MODE        
------------------------------
05          CEX5C CCA-Coproc  
05.0004     CEX5C CCA-Coproc
05.00ab     CEX5C CCA-Coproc  
06          CEX5A Accelerator 
06.0004     CEX5A Accelerator 
06.00ab     CEX5C CCA-Coproc  

Guest2
------
CARD.DOMAIN TYPE  MODE        
------------------------------
05          CEX5A Accelerator 
05.0047     CEX5A Accelerator 
05.00ff     CEX5A Accelerator 

One thing to notice in this example is that each AP Queue set is identical.
For example, the two AP Queue sets for Guest1 both contain APQI 0004 and 
00ab. It would be an invalid condition if both queue sets did not contain
the same set of queues. We could not, for example, configure Guest1 with 
access to AP queue 05.00ff because the AP queue set for adapter 06 does not
contain AP queue 06.00ff. The point is, one must be careful to reserve 
a valid set of AP queues for a given guest.
a valid configuration. 

These are the steps for configuring the Guest1 and Guest2:
   
1. The first thing that needs to be done is to secure the AP queues to be 
   used by the two guests so that the host can not access them. This is done 
   by unbinding each AP Queue device from its respective AP driver. In our 
   example, these queues are bound to the cex4queue driver. This would be 
   the sysfs location of these devices: 

   /sys/bus/ap
   --- [drivers]
   ------ [cex4queue]
   --------- [05.0004]
   --------- [05.0047]
   --------- [05.00ab]
   --------- [05.00ff]
   --------- [06.0004]
   --------- [06.00ab]
   --------- unbind

   To unbind AP queue 05.0004 from the cex4queue device driver:

	echo 05.0004 > unbind

   This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
   and 06.00ab.

2. The next step is to reserve the queues for use by the two KVM guests. 
   This is accomplished by binding them to the VFIO AP matrix device driver. 
   This is the sysfs location of the VFIO AP matrix device driver:

   /sys/bus/ap
   ---[drivers]
   ------ [vfio_ap_matrix]
   ---------- bind

   To bind queue 05.0004 to the vfio_ap_matrix driver:

	echo 05.0004 > bind

   This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
   and 06.00ab.

3. Create the mediated devices needed to configure the AP matrices for the 
   two guests and to provide an interface to the vfio_ap_matrix driver for 
   use by the guests:

   /sys/devices/
   --- [ap_matrix]
   ------ [matrix] (this is the matrix device)
   --------- [mdev_supported_types]
   ------------ [ap_matrix-passthrough] (passthrough mediated device type)
   --------------- create
   --------------- [devices]

   To create the mediated devices for the two guests:

	uuidgen > create
	uuidgen > create

   This will create two mediated devices in the [devices] subdirectory named 
   with the UUID written to the create attribute file. We call them $uuid1
   and $uuid2:

   /sys/devices/
   --- [ap_matrix]
   ------ [matrix]
   --------- [mdev_supported_types]
   ------------ [ap_matrix-passthrough]
   --------------- [devices]
   ------------------ [$uuid1]
   --------------------- adapters
   --------------------- assign_adapter
   --------------------- assign_control_domain
   --------------------- assign_domain
   --------------------- control_domains
   --------------------- domains
   --------------------- unassign_adapter
   --------------------- unassign_control_domain
   --------------------- unassign_domain
   ------------------ [$uuid2]
   --------------------- adapters
   --------------------- assign_adapter
   --------------------- assign_control_domain
   --------------------- assign_domain
   --------------------- control_domains
   --------------------- domains
   --------------------- unassign_adapter
   --------------------- unassign_control_domain
   --------------------- unassign_domain

4. The administrator now needs to configure the matrices for mediated 
   devices $uuid1 (for Guest1) and $uuid2 (for Guest2). 

   This is how the matrix is configured for Guest1:

   echo 5 > assign_adapter
   echo 6 > assign_adapter 
   echo 4 > assign_domain
   echo ab > assign_domain

   When the assign.xxx file is written, the corresponding bit in the 
   respective MDEV matrix device's bitmask will be set. For example, when 
   adapter 5 is assigned, bit 5 - numbered from left to right starting with 
   bit 0 - will be set in the MDEV matrix device's APM. 

   By architectural convention, all usage domains - i.e., domains assigned 
   via the assign_domain attribute file - will also be configured in the ADM 
   field of the KVM guest's CRYCB, so there is no need to assign control 
   domains here unless you want to assign control domains that are not 
   assigned as usage domains. 

   If a mistake is made configuring an adapter, domain or control domain, 
   you can use the unassign_xxx files to unassign the adapter, domain or 
   control domain.

   To display the matrix configuration for Guest1:

   cat adapters
   cat domains
   cat control_domains

   This is how the matrix is configured for Guest2:

   echo 5 > assign_adapter 
   echo 47 > assign_domain
   echo ff > assign_domain

When a KVM guest is started, QEMU will open the file descriptor for its 
MDEV matrix device. The VFIO AP matrix device driver will be notified 
and will store the reference to the KVM guest's SIE state description. 
QEMU will then call the VFIO AP matrix ioctl requesting that the 
KVM guest's matrix be configured. The matrix driver will set the bits in the 
APM, AQM and ADM fields of the CRYCB referenced by the guest's SIE state 
description from the EAPM, EAQM and EADM created by performing a logical AND
of the AP masks configured in the MDEV matrix device and the masks 
configured in the host's SIE state description. When the guest comes up, it 
will have access to the APQNs identified in the AP matrix specified in the 
KVM guest's SIE state description. Programs running on the guest will then 
be able to use the cryptographic functions provided by the AP facilities 
configured for the guest.

Tony Krowiak (19):
  KVM: s390: SIE considerations for AP Queue virtualization
  KVM: s390: refactor crypto initialization
  s390/zcrypt: new AP matrix bus
  s390/zcrypt: create an AP matrix device on the AP matrix bus
  s390/zcrypt: base implementation of AP matrix device driver
  s390/zcrypt: register matrix device with VFIO mediated device
    framework
  KVM: s390: introduce AP matrix configuration interface
  s390/zcrypt: support for assigning adapters to matrix mdev
  s390/zcrypt: validate adapter assignment
  s390/zcrypt: sysfs interfaces supporting AP domain assignment
  s390/zcrypt: validate domain assignment
  s390/zcrypt: sysfs support for control domain assignment
  s390/zcrypt: validate control domain assignment
  KVM: s390: Connect the AP mediated matrix device to KVM
  s390/zcrypt: introduce ioctl access to VFIO AP Matrix driver
  KVM: s390: interface to configure KVM guest's AP matrix
  KVM: s390: validate input to AP matrix config interface
  KVM: s390: New ioctl to configure KVM guest's AP matrix
  s390/facilities: enable AP facilities needed by guest

 MAINTAINERS                                  |   13 +
 arch/s390/Kconfig                            |   13 +
 arch/s390/configs/default_defconfig          |    1 +
 arch/s390/configs/gcov_defconfig             |    1 +
 arch/s390/configs/performance_defconfig      |    1 +
 arch/s390/defconfig                          |    1 +
 arch/s390/include/asm/ap-config.h            |   32 +
 arch/s390/include/asm/kvm_host.h             |   26 +-
 arch/s390/kvm/Makefile                       |    2 +-
 arch/s390/kvm/ap-config.c                    |  224 ++++++++
 arch/s390/kvm/kvm-s390.c                     |   17 +-
 arch/s390/tools/gen_facilities.c             |    2 +
 drivers/s390/crypto/Makefile                 |    6 +-
 drivers/s390/crypto/ap_matrix_bus.c          |  115 ++++
 drivers/s390/crypto/ap_matrix_bus.h          |   25 +
 drivers/s390/crypto/vfio_ap_matrix_drv.c     |  107 ++++
 drivers/s390/crypto/vfio_ap_matrix_ops.c     |  790 ++++++++++++++++++++++++++
 drivers/s390/crypto/vfio_ap_matrix_private.h |   50 ++
 include/uapi/linux/vfio.h                    |   22 +
 19 files changed, 1438 insertions(+), 10 deletions(-)
 create mode 100644 arch/s390/include/asm/ap-config.h
 create mode 100644 arch/s390/kvm/ap-config.c
 create mode 100644 drivers/s390/crypto/ap_matrix_bus.c
 create mode 100644 drivers/s390/crypto/ap_matrix_bus.h
 create mode 100644 drivers/s390/crypto/vfio_ap_matrix_drv.c
 create mode 100644 drivers/s390/crypto/vfio_ap_matrix_ops.c
 create mode 100644 drivers/s390/crypto/vfio_ap_matrix_private.h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ