lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Oct 2017 15:39:09 -0400
From:   Tony Krowiak <akrowiak@...ux.vnet.ibm.com>
To:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     freude@...ibm.com, schwidefsky@...ibm.com,
        heiko.carstens@...ibm.com, borntraeger@...ibm.com,
        cohuck@...hat.com, kwankhede@...dia.com,
        bjsdjshi@...ux.vnet.ibm.com, pbonzini@...hat.com,
        alex.williamson@...hat.com, pmorel@...ux.vnet.ibm.com,
        alifm@...ux.vnet.ibm.com, mjrosato@...ux.vnet.ibm.com,
        qemu-s390x@...gnu.org, jjherne@...ux.vnet.ibm.com,
        thuth@...hat.com, pasic@...ux.vnet.ibm.com
Subject: Re: [RFC 00/19] KVM: s390/crypto/vfio: guest dedicated crypto
 adapters

On 10/13/2017 01:38 PM, Tony Krowiak wrote:
Ping
> Overview:
> --------
> An adjunct processor (AP) facility is an IBM Z cryptographic facility. The
> AP facility is comprised of three AP instructions and from 1 to 256 AP
> adapter cards. The design takes advantage of the interpretive execution mode
> provided by the SIE architecture. With interpretive execution mode, the AP
> instructions executed on the guest are interpreted by the hardware. This
> allows guests direct access to AP adapter cards. The first goal of this
> patch series is to provide direct access by a KVM guest to an AP as a
> pass-through device. The second goal is to provide administrators with the
> means to configure KVM guests to grant direct access to AP facilities
> assigned to the LPAR in which the host linux system is running.
>
> To facilitate the comprehension of the design, let's present an overview of
> the AP architecture.
>
> AP Architectural Overview
> -------------------------
> Let's start with some definitions:
>
> * AP adapter
>
>    An AP adapter is an IBM Z adapter card that can perform cryptographic
>    functionality. There can be from 0 to 256 adapters assigned to an LPAR.
>    Each adapter is identified by a number from 0 to 255.   When
>    installed, an AP is accessed by AP instructions executed by any CPU.
>
> * AP domain
>
>    An adapter can be partitioned into domains. An adapter can hold up to 256
>    domains. Each domain is identified by a number from 0 to 255. Domains can
>    be further classified into two types:
>    
>      * Usage domains are domains that can be accessed directly to process AP
>        commands
>    
>      * Control domains are domains that are accessed indirectly by AP
>        commands sent to a usage domain to control or change the domain.
>
> * AP Queue
>
>    An AP queue is the means by which an AP command is sent to an
>    AP usage domain inside a specific AP. An AP queue is identified by a tuple
>    comprised of an AP adapter ID and a usage domain index corresponding
>    to a given usage domain within the adapter. This tuple forms an AP Queue
>    Number (APQN) uniquely identifying an AP queue. AP instructions include
>    a field containing the APQN to identify the AP queue to which the AP
>    command is targetted.
>
> * AP Instructions:
>
>    There are three AP instructions:
>
>    * NQAP: to enqueue an AP command-request message to a queue
>    * DQAP: to dequeue an AP command-reply message from a queue
>    * PQAP: to adminster the queues
>
> Let's now see how AP instructions are interpreted by the hardware.
>
> Start Interpretive Execution (SIE) Instruction
> ----------------------------------------------
> A KVM guest is started by executing the Start Interpretive Execution (SIE)
> instruction. The SIE state description is a control block that contains the
> state information for a KVM guest and is supplied as input to the SIE
> instruction. The SIE state description contains a field that references
> a Crypto Control Block (CRYCB). The CRYCB contains three bitmask fields
> identifying the adapters, usage domains and control domains assigned to the
> KVM guest:
>
> * The AP Mask (APM) field specifies the AP adapters assigned to the
>    KVM guest. The APM controls which adapters are valid for the KVM guest.
>    The bits in the mask, from left to right, correspond to APIDs
>    0 up to the number of adapters that can be assigned to the LPAR. If a bit
>    is set, the corresponding adapter is valid for use by the KVM guest.
>
> * The AP Queue Mask (AQM) field specifies the AP usage domains assigned
>    to the KVM guest. The bits in the mask, from left to right, correspond
>    to the usage domains, from 0 up to the number of domains that can be
>    assigned to the LPAR. If a bit is set, the corresponding usage domain is
>    valid for use by the KVM guest.
>
> * The AP Domain Mask field specifies the AP control domains assigned to the
>    KVM guest. The ADM bitmask controls which domains can be changed by an AP
>    command-request message sent to a usage domain from the guest. The bits in
>    the mask, from left to right, correspond to domain 0 up to the number of
>    domains that can be assigned to the LPAR. If a bit is set, the
>    corresponding domain can be modified by an AP command-request message
>    sent to a usage domain configured for the KVM guest.
>
> If you recall from the description of an AP Queue, AP instructions include
> an APQN to identify the AP adapter and the specific usage domain within
> the adapter to which an AP command-request message is to be sent (NQAP
> and PQAP instructions), or from which a command-reply message is to be
> received (DQAP instruction). The validity of an APQN is defined by the
> matrix calculated from the APM and AQM; it is the intersection of all
> assigned adapter numbers (APM) with all assigned usage domain numbers (AQM).
> For example, if adapters 1 and 2 and usage domains 5 and 6 are assigned to
> a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for the
> guest.
>
> The APQNs provide secure key functionality - i.e., the key is stored on the
> adapter card - so when the adapter card is not virtualized - i.e., the
> adapter is accessed directly by the guest - each APQN must be assigned to
> at most one guest.
>
>     Example 1: Valid configuration:
>     ------------------------------
>     Guest1: adapters 1,2  domains 5,6
>     Guest2: adapter  1,2  domain 7
>
>     This is valid because both guests have a unique set of APQNs: Guest1 has
>     APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQN (1,7) and (2,7).
>
>     Example 2: Invalid configuration:
>     --------------------------------
>     Guest1: adapters 1,2  domains 5,6
>     Guest2: adapter  1    domains 6,7
>
>     This is an invalid configuration because both guests have access to
>     APQNs (1,6).
>
> Interruption architecture:
>
> The AP interruption architecture may or may not generate interruptions to
> signal to the CPU the end of an AP transaction. The SIE interruption
> architecture, depending upon its configuration, may or may not redirect
> AP interrupts directly to a guest if the associated queue is valid for a
> guest, and may or may not report the interruption to the host.
>
> Effective masking for guest level I and II:
>
> A linux host running in the LPAR operates at guest-level 1 and has its own
> SIE state description. When operating at guest-level 1, the masks from the
> host's state description are used directly. A linux guest running in the
> host operates at guest-level 2. When operating at guest-level 2, the masks
> from the guest-level 1 (host) and guest-level 2 (guest) state descriptions
> are combined into a single description called an effective mask by
> performing a logical AND of the two state descriptions.
>
> The effective mask algorithm is used for the APM, AQM and ADM to create
> an EAPM, EAQM and EADM respectively. Use of the EAPM, EAQM and EADM
> precludes a guest-level 1 host program from passing to a guest-level 2
> program APQNs to which it does not have access.
>
> Linux cryptographic bus driver:
>
> Linux already has a cryptographic bus driver that provides one AP device per
> AP adapter and one device per AP queue. There is a device driver for each
> type of AP adapter device and each type of AP queue device. This design
> utilizes some of the interfaces and functionality provided by the AP bus
> driver.
>
> Design Origin:
> -------------
>
> The original design was based on modelling AP Queue devices. The design
> utilized the VFIO mediated device framework whereby a mediated AP queue
> device would be created for each AP Queue bound to the VFIO AP Queue device
> driver. This at first seemed like the most logical design choice for the
> following reasons:
>
> * Securing access to an AP Queue device by unbinding it from its default
>    device driver and binding it to the VFIO device driver would not preclude
>    the host from having access to the other usage domains contained within
>    the same adapter card connected to the AP queue.
>
> * An AP command is sent to a usage domain within a specific AP adapter via
>    an AP queue.
>
> It became readily apparent that modelling the design on an AP queue was very
> convoluted for a number of reasons:
>
>    * There is no convenient way to notify the VFIO device driver which guest
>      will have access to a given mediated AP queue device until the mediated
>      device's file descriptor is opened by the guest. Recall that the APQNs
>      configured for the guest are an intersection of all of the bits set in
>      both the APM and AQM, so the guest's APQNs can not be validated nor
>      its SIE state description configured until all of the guest's mediated
>      AP queue device file descriptors have been opened.
>
>      For example, suppose a guest opens file descriptors for mediated AP
>      queue devices representing APQNs 3,5 and 4,6. If bits 3 and 4 are set in
>      the guest's APM and bits 5 and 6 are set in the guest's AQM, then APQNs
>      (3,5), (3,6), (4,5) and (4,6) will be valid for the guest, but mediated
>      AP queue devices have been created only for APQNs (3,5) and (4,6). In
>      this case, APQNs still assigned to the host would also be available to
>      the guest which is a potential security breach.
>
>    * Control domains are not devices and are not logically modelled as
>      mediated devices. In our original design, they were modelled as
>      attributes of a mediated AP queue device, but this was a clumsy use of
>      the VFIO mediated device model.
>
>    * The SIE state description models the assignment of AP resources as a
>      matrix via the APM, AQM and ADM.
>       
> The design we ultimately settled upon was modelled on the AP matrix as
> defined by the SIE state description. Supplying the complete AP matrix
> to SIE using bitmasks when starting a guest simplifies the code, is far
> easier to secure, and more closely matches the model employed by SIE. This
> is the design model implemented via this patch set.
>
> The Design
> ----------
> This design introduces four new objects:
>
> 1. AP matrix bus
>
>     The sysfs location of the AP matrix bus is /sys/bus/ap_matrix. This
>     bus will create a single AP matrix device (see below).
>
> 2. AP matrix device
>
>     The AP matrix device is a singleton that hangs off of the AP matrix bus.
>     This device holds the AP Queues that have been reserved for use by
>     KVM guests. The sysfs location of the AP matrix device is
>     /sys/devices/ap_matrix/matrix. It is also linked from the AP matrix
>     bus at /sys/bus/ap_matrix/devices/matrix.
>
> 3. VFIO AP matrix driver
>
>     This driver is based on the VFIO mediated device framework. When the
>     driver is initialized, it will:
>
>     * Get the AP matrix device created by AP matrix bus from the bus
>
>     * Register with the AP bus to indicate that it can control AP Queue
>       devices. This allows AP Queue devices unbound from AP device drivers
>       to be bound to the VFIO AP matrix driver. The AP Queues bound to the
>       VFIO AP matrix driver will be stored by the driver in the AP matrix
>       device.
>
>     * Register the AP matrix device with the VFIO mediated device
>       framework (MDEV). Registration with MDEV will create the sysfs
>       structures needed to create mediated matrix devices. Each MDEV matrix
>       device is used to configure the AP matrix for a KVM guest. The MDEV
>       matrix device's file descriptor can be used by QEMU to communicate
>       with the VFIO AP matrix device driver.
>
>     The VFIO AP matrix driver:
>
>     * Provides the interfaces the administrator can use to secure AP Queues
>       for use by KVM guests. This is accomplished by unbinding the AP Queues
>       needed by each KVM guest from its AP device driver and binding it to
>       the VFIO AP queue driver. This prevents the host linux system from
>       using these Queues.
>
>     * Provides an ioctl that can be used by QEMU to configure the
>       CRYCB referenced by the KVM guest's SIE state description. The ioctl
>       will
>
>       * Create an EAPM, EAQM and EADM by performing a logical AND of the
>         APM, AQM and ADM configured via the MDEV matrix device's sysfs
>         attributes files (see below) with the APM, AQM and ADM of the host's
>         SIE state description respectively.
>
>       * Configure the SIE state description for the KVM guest using the
>         effective masks created in the previous step.
>
> 4. VFIO MDEV matrix passthrough device
>
>     An MDEV matrix passthrough device must be created for each KVM guest that
>     will need access to AP facilities. An MDEV matrix passthrough device is
>     used by QEMU to configure the APM, AQM and ADM fields of the CRYCB
>     referenced by the KVM guest's SIE state description. The file descriptor
>     for the MDEV matrix passthrough device provides the communication pathway
>     between QEMU and the VFIO AP matrix device driver.
>
>     The MDEV matrix passthrough device, like the CRYCB, contains three
>     bitmasks - an APM, AQM and ADM - for specifying the AP matrix for the
>     KVM guest. Three sets of attributes files will be provided to allow an
>     administrator to set the bits in the MDEV matrix device's APM, AQM and
>     ADM:
>
>     * A file to assign an AP adapter
>     * A file to unassign an AP adapter
>     * A file to display the adapters assigned
>
>     * A file to assign an AP domain
>     * A file to unassign an AP domain
>     * A file to display the domains assigned
>
>     * A file to assign an AP control domain
>     * A file to unassign an AP control domain
>     * A file to display the control domains assigned
>
> Example:
> -------
> Let's now provide an example to illustrate how KVM guests may be given
> access to AP facilities. For this example, we will show how to configure
> two guests such that executing the lszcrypt command on the guests would
> look like this:
>
> Guest1
> ------
> CARD.DOMAIN TYPE  MODE
> ------------------------------
> 05          CEX5C CCA-Coproc
> 05.0004     CEX5C CCA-Coproc
> 05.00ab     CEX5C CCA-Coproc
> 06          CEX5A Accelerator
> 06.0004     CEX5A Accelerator
> 06.00ab     CEX5C CCA-Coproc
>
> Guest2
> ------
> CARD.DOMAIN TYPE  MODE
> ------------------------------
> 05          CEX5A Accelerator
> 05.0047     CEX5A Accelerator
> 05.00ff     CEX5A Accelerator
>
> One thing to notice in this example is that each AP Queue set is identical.
> For example, the two AP Queue sets for Guest1 both contain APQI 0004 and
> 00ab. It would be an invalid condition if both queue sets did not contain
> the same set of queues. We could not, for example, configure Guest1 with
> access to AP queue 05.00ff because the AP queue set for adapter 06 does not
> contain AP queue 06.00ff. The point is, one must be careful to reserve
> a valid set of AP queues for a given guest.
> a valid configuration.
>
> These are the steps for configuring the Guest1 and Guest2:
>     
> 1. The first thing that needs to be done is to secure the AP queues to be
>     used by the two guests so that the host can not access them. This is done
>     by unbinding each AP Queue device from its respective AP driver. In our
>     example, these queues are bound to the cex4queue driver. This would be
>     the sysfs location of these devices:
>
>     /sys/bus/ap
>     --- [drivers]
>     ------ [cex4queue]
>     --------- [05.0004]
>     --------- [05.0047]
>     --------- [05.00ab]
>     --------- [05.00ff]
>     --------- [06.0004]
>     --------- [06.00ab]
>     --------- unbind
>
>     To unbind AP queue 05.0004 from the cex4queue device driver:
>
> 	echo 05.0004 > unbind
>
>     This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
>     and 06.00ab.
>
> 2. The next step is to reserve the queues for use by the two KVM guests.
>     This is accomplished by binding them to the VFIO AP matrix device driver.
>     This is the sysfs location of the VFIO AP matrix device driver:
>
>     /sys/bus/ap
>     ---[drivers]
>     ------ [vfio_ap_matrix]
>     ---------- bind
>
>     To bind queue 05.0004 to the vfio_ap_matrix driver:
>
> 	echo 05.0004 > bind
>
>     This must also be done for AP queues 05.00ab, 05.0047, 05.00ff, 06.0004,
>     and 06.00ab.
>
> 3. Create the mediated devices needed to configure the AP matrices for the
>     two guests and to provide an interface to the vfio_ap_matrix driver for
>     use by the guests:
>
>     /sys/devices/
>     --- [ap_matrix]
>     ------ [matrix] (this is the matrix device)
>     --------- [mdev_supported_types]
>     ------------ [ap_matrix-passthrough] (passthrough mediated device type)
>     --------------- create
>     --------------- [devices]
>
>     To create the mediated devices for the two guests:
>
> 	uuidgen > create
> 	uuidgen > create
>
>     This will create two mediated devices in the [devices] subdirectory named
>     with the UUID written to the create attribute file. We call them $uuid1
>     and $uuid2:
>
>     /sys/devices/
>     --- [ap_matrix]
>     ------ [matrix]
>     --------- [mdev_supported_types]
>     ------------ [ap_matrix-passthrough]
>     --------------- [devices]
>     ------------------ [$uuid1]
>     --------------------- adapters
>     --------------------- assign_adapter
>     --------------------- assign_control_domain
>     --------------------- assign_domain
>     --------------------- control_domains
>     --------------------- domains
>     --------------------- unassign_adapter
>     --------------------- unassign_control_domain
>     --------------------- unassign_domain
>     ------------------ [$uuid2]
>     --------------------- adapters
>     --------------------- assign_adapter
>     --------------------- assign_control_domain
>     --------------------- assign_domain
>     --------------------- control_domains
>     --------------------- domains
>     --------------------- unassign_adapter
>     --------------------- unassign_control_domain
>     --------------------- unassign_domain
>
> 4. The administrator now needs to configure the matrices for mediated
>     devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
>
>     This is how the matrix is configured for Guest1:
>
>     echo 5 > assign_adapter
>     echo 6 > assign_adapter
>     echo 4 > assign_domain
>     echo ab > assign_domain
>
>     When the assign.xxx file is written, the corresponding bit in the
>     respective MDEV matrix device's bitmask will be set. For example, when
>     adapter 5 is assigned, bit 5 - numbered from left to right starting with
>     bit 0 - will be set in the MDEV matrix device's APM.
>
>     By architectural convention, all usage domains - i.e., domains assigned
>     via the assign_domain attribute file - will also be configured in the ADM
>     field of the KVM guest's CRYCB, so there is no need to assign control
>     domains here unless you want to assign control domains that are not
>     assigned as usage domains.
>
>     If a mistake is made configuring an adapter, domain or control domain,
>     you can use the unassign_xxx files to unassign the adapter, domain or
>     control domain.
>
>     To display the matrix configuration for Guest1:
>
>     cat adapters
>     cat domains
>     cat control_domains
>
>     This is how the matrix is configured for Guest2:
>
>     echo 5 > assign_adapter
>     echo 47 > assign_domain
>     echo ff > assign_domain
>
> When a KVM guest is started, QEMU will open the file descriptor for its
> MDEV matrix device. The VFIO AP matrix device driver will be notified
> and will store the reference to the KVM guest's SIE state description.
> QEMU will then call the VFIO AP matrix ioctl requesting that the
> KVM guest's matrix be configured. The matrix driver will set the bits in the
> APM, AQM and ADM fields of the CRYCB referenced by the guest's SIE state
> description from the EAPM, EAQM and EADM created by performing a logical AND
> of the AP masks configured in the MDEV matrix device and the masks
> configured in the host's SIE state description. When the guest comes up, it
> will have access to the APQNs identified in the AP matrix specified in the
> KVM guest's SIE state description. Programs running on the guest will then
> be able to use the cryptographic functions provided by the AP facilities
> configured for the guest.
>
> Tony Krowiak (19):
>    KVM: s390: SIE considerations for AP Queue virtualization
>    KVM: s390: refactor crypto initialization
>    s390/zcrypt: new AP matrix bus
>    s390/zcrypt: create an AP matrix device on the AP matrix bus
>    s390/zcrypt: base implementation of AP matrix device driver
>    s390/zcrypt: register matrix device with VFIO mediated device
>      framework
>    KVM: s390: introduce AP matrix configuration interface
>    s390/zcrypt: support for assigning adapters to matrix mdev
>    s390/zcrypt: validate adapter assignment
>    s390/zcrypt: sysfs interfaces supporting AP domain assignment
>    s390/zcrypt: validate domain assignment
>    s390/zcrypt: sysfs support for control domain assignment
>    s390/zcrypt: validate control domain assignment
>    KVM: s390: Connect the AP mediated matrix device to KVM
>    s390/zcrypt: introduce ioctl access to VFIO AP Matrix driver
>    KVM: s390: interface to configure KVM guest's AP matrix
>    KVM: s390: validate input to AP matrix config interface
>    KVM: s390: New ioctl to configure KVM guest's AP matrix
>    s390/facilities: enable AP facilities needed by guest
>
>   MAINTAINERS                                  |   13 +
>   arch/s390/Kconfig                            |   13 +
>   arch/s390/configs/default_defconfig          |    1 +
>   arch/s390/configs/gcov_defconfig             |    1 +
>   arch/s390/configs/performance_defconfig      |    1 +
>   arch/s390/defconfig                          |    1 +
>   arch/s390/include/asm/ap-config.h            |   32 +
>   arch/s390/include/asm/kvm_host.h             |   26 +-
>   arch/s390/kvm/Makefile                       |    2 +-
>   arch/s390/kvm/ap-config.c                    |  224 ++++++++
>   arch/s390/kvm/kvm-s390.c                     |   17 +-
>   arch/s390/tools/gen_facilities.c             |    2 +
>   drivers/s390/crypto/Makefile                 |    6 +-
>   drivers/s390/crypto/ap_matrix_bus.c          |  115 ++++
>   drivers/s390/crypto/ap_matrix_bus.h          |   25 +
>   drivers/s390/crypto/vfio_ap_matrix_drv.c     |  107 ++++
>   drivers/s390/crypto/vfio_ap_matrix_ops.c     |  790 ++++++++++++++++++++++++++
>   drivers/s390/crypto/vfio_ap_matrix_private.h |   50 ++
>   include/uapi/linux/vfio.h                    |   22 +
>   19 files changed, 1438 insertions(+), 10 deletions(-)
>   create mode 100644 arch/s390/include/asm/ap-config.h
>   create mode 100644 arch/s390/kvm/ap-config.c
>   create mode 100644 drivers/s390/crypto/ap_matrix_bus.c
>   create mode 100644 drivers/s390/crypto/ap_matrix_bus.h
>   create mode 100644 drivers/s390/crypto/vfio_ap_matrix_drv.c
>   create mode 100644 drivers/s390/crypto/vfio_ap_matrix_ops.c
>   create mode 100644 drivers/s390/crypto/vfio_ap_matrix_private.h
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ