lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZXwi+UN0OP9+ht99@gondor.apana.org.au>
Date: Fri, 15 Dec 2023 17:57:13 +0800
From: Herbert Xu <herbert@...dor.apana.org.au>
To: Tom Zanussi <tom.zanussi@...ux.intel.com>
Cc: davem@...emloft.net, fenghua.yu@...el.com, vkoul@...nel.org,
	dave.jiang@...el.com, tony.luck@...el.com,
	wajdi.k.feghali@...el.com, james.guilford@...el.com,
	kanchana.p.sridhar@...el.com, vinodh.gopal@...el.com,
	giovanni.cabiddu@...el.com, pavel@....cz,
	linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org,
	dmaengine@...r.kernel.org
Subject: Re: [PATCH v12 00/14] crypto: Add Intel Analytics Accelerator (IAA)
 crypto compression driver

On Tue, Dec 05, 2023 at 03:25:16PM -0600, Tom Zanussi wrote:
> Hi, this is v12 of the IAA crypto driver, incorporating feedback from
> v11.
> 
> v12 changes:
> 
>   - Moved __free_iaa_wq out of spinlock section.
> 
> 
> v11 changes:
> 
>   - Changed the wq size default from 16 to wq->max_wq_Size in
>     'dmaengine: idxd: Add support for device/wq defaults and updated
>     the Documentation to match.
> 
>   - Added new Reviewed-by and Acked-by tags.
>   
> 
> v10 changes:
> 
>   - Fixed the warnings pointed out in v9 when using 'make W=1 C=1'.
>   
>   - Added a new patch, 'dmaengine: idxd: Add support for device/wq
>     defaults' which auto-configures a default set of devices and wqs
>     so users don't have to do any IAA configuration/setup if they
>     don't want to.  Also updated the Documentation accordingly.
> 
>   - Removed the 'dmaengine: idxd: add wq driver name support for
>     accel-config user tool' patch since it's now upstream.
> 
>   - Added a section (in the Appendix) to
>     Documentation/driver-api/crypto/iaa/iaa-crypto.rst describing the
>     sysfs interface.
> 
>   - While converting my testing scripts to accel-config, realized
>     accel-config won't allow disabling the wqs if client_count is > 0,
>     so to address it moved the idxd_wq_get() into iaa_wq_get().
> 
>   - Fixed a bug in the pr_debug() in wq_table_next_wq(), noticed while
>     testing.
> 
> 
> v9 changes:
> 
>   - Renamed drv_enable/disable_wq() to idxd_drv_enable/disable_wq()
>     and exported it, changing all existing callers as well as the
>     iaa_crypto driver.
> 
>   - While testing, ran into a use-after-free bug in the irq support
>     flagged by KASAN so fixed that up in iaa_compress() (added missing
>     disable_async check).
> 
>   - Also, while fixing the use-after-free bug, rearranged the out:
>     part of iaa_desc_complete() to make it cleaner.
> 
>   - Also for the verify cases, reversed the dma mapping by adding and
>     calling a new iaa_remap_for_verify() function, since verify
>     basically does a decompress after reversing the src and dst
>     buffers.
> 
>   - Added new Acked-by and Reviewed-by tags.
> 
> 
> v8 changes:
> 
>   - Rebased to current cryptodev tree.
> 
>   - Added generic-deflate software fallback for decompression
>     in cases where there's a hardware failure that would
>     otherwise prevent the data being recovered.
>   
>   - Changed the driver_name code to use strim().
> 
>   - Changed the null-destination cases to use sgl_alloc_order() rather
>     than sgl_alloc().
> 
>   - Added more Reviewed-by tags.
> 
> 
> v7 changes:
> 
>   - Rebased to current cryptodev tree.
> 
>   - Removed 'canned' compression mode (deflate-iaa-canned) and fixed
>     up dependencies in other patches and Documentation.
>   
>   - Removed op_block checks.
> 
>   - Removed a stray debugging #ifdef.
> 
>   - Changed sysfs-driver-dma-idxd driver_name version to 6.6.0.
> 
> 
> v6 changes:
> 
>   - Rebased to current cryptodev tree.
> 
>   - Changed code to create/register separate algorithms for each
>     compression mode - one for 'fixed' (deflate-iaa) and one for
>     'canned' (deflate-iaa-canned).
>   
>   - Got rid of 'compression_mode' attribute and all the compression
>     mode code that deals with 'active' compression modes, since
>     there's no longer a single active compression mode.
> 
>   - Use crypto_ctx to allow common compress/decompress code to
>     distinguish between the different compression modes.  Also use it
>     to capture settings such as verify_compress, use_irq, etc.  In
>     addition to being cleaner, this will allow for easier performance
>     comparisons using different modes/settings.
> 
>   - Update Documentation and comments to reflect the changes.
> 
>   - Fixed a bug found by Rex Zhang in decompress_header() which
>     unmapped src2 rather than src as it should have.  Thanks, Rex!
> 
> 
> v5 changes:
> 
>   - Rebased to current cryptodev tree.
> 
>   - Changed sysfs-driver-dma-idxd driver_name version to 6.5.0.
> 
>   - Renamed wq private accessor functions to idxd_wq_set/get_private().
> 
> v4 changes:
> 
>   - Added and used DRIVER_NAME_SIZE for wq driver_name.
> 
>   - Changed all spaces to tabs in CRYPTO_DEV_IAA_CRYPTO_STATS config
>     menu.
> 
>   - Removed the private_data void * from wq and replaced with
>     wq_confdev() instead, as suggested by Dave Jiang.
> 
>   - Added  more Reviewed-by tags.
> 
> v3 changes:
> 
>   - Reworked the code to only allow the registered crypto alg to be
>     unregistered by removing the module.  Also added an iaa_wq_get()
>     and iaa_wq_put() to take/give up a reference to the work queue
>     while there are compresses/decompresses in flight.  This is
>     synchronized with the wq remove function, so that the
>     iaa_wq/iaa_devices can't go away beneath active operations.  This
>     was tested by removing/disabling the iaa wqs/devices while
>     operations were in flight.
> 
>   - Simplified the rebalance code and removed cpu_to_iaa() function
>     since it was overly complicated and wasn't actually working as
>     advertised.
> 
>   - As a result of reworking the above code, fixed several bugs such
>     as possibly unregistering an unregistered crypto alg, a memory
>     leak where iaa_wqs weren't being freed, and making sure the
>     compression schemes were registered before registering the driver.
> 
>   - Added set_/idxd_wq_private() accessors for wq private data.
> 
>   - Added missing XPORT_SYMBOL_NS_GPL() to [PATCH 04/15] dmaengine:
>     idxd: Export descriptor management functions
> 
>   - Added Dave's Reviewed-by: tags from v2.
> 
>   - Updated Documentation and commit messages to reflect the changes
>     above.
>   
>   - Rebased to to cryptodev tree, since that has the earlier changes
>     that moved the intel drivers to crypto/intel.
> 
> v2 changes:
> 
>   - Removed legacy interface and all related code; merged async
>     interface into main deflate patch.
> 
>   - Added support for the null destination case.  Thanks to Giovanni
>     Cabiddu for making me aware of this as well as the selftests for
>     it.
> 
>   - Had to do some rearrangement of the code in order to pass all the
>     selftests.  Also added a testcase for 'canned'.
> 
>   - Moved the iaa crypto driver to drivers/crypto/intel, and moved all
>     the other intel drivers there as well (which will be posted as a
>     separate series immediately following this one).
> 
>   - Added an iaa crypto section to MAINTAINERS.
> 
>   - Updated the documenation and commit messages to reflect the removal
>     of the legacy interface.
> 
>   - Changed kernel version from 6.3.0 to 6.4.0 in patch 01/15 (wq
>     driver name support)
> 
> v1:
> 
> This series adds Linux crypto algorithm support for Intel® In-memory
> Analytics Accelerator (Intel IAA) [1] hardware compression and
> decompression, which is available on Sapphire Rapids systems.
> 
> The IAA crypto support is implemented as an IDXD sub-driver.  The IDXD
> driver already present in the kernel provides discovery and management
> of the IAA devices on a system, as well as all the functionality
> needed to manage, submit, and wait for completion of work executed on
> them.  The first 7 patches (patches starting with dmaengine:) add
> small bits of underlying IDXD plumbing needed to allow external
> sub-drivers to take advantage of this support and claim ownership of
> specific IAA devices and workqueues.
> 
> The remaining patches add the main support for this feature via the
> crypto API, making it transparently accessible to kernel features that
> can make use of it such as zswap and zram (patches starting with
> crypto – iaa:).
> 
> These include both sync/async support for the deflate algorithm
> implemented by the IAA hardware, as well as an additional option for
> driver statistics and Documentation.
> 
> Patch 8 ('[PATCH 08/15] crypto: iaa - Add IAA Compression Accelerator
> Documentation') describes the IAA crypto driver in detail; the
> following is just a high-level synopsis meant to aid the following
> discussion.
> 
> The IAA hardware is fairly complex and generally requires a
> knowledgeable administrator with sufficiently detailed understanding
> of the hardware to set it up before it can be used.  As mentioned in
> the Documentation, this typically requires using a special tool called
> accel-config to enumerate and configure IAA workqueues, engines, etc,
> although this can also be done using only sysfs files.
> 
> The operation of the driver mirrors this requirement and only allows
> the hardware to be accessed via the crypto layer once the hardware has
> been configured and bound to the the IAA crypto driver.  As an IDXD
> sub-driver, the IAA crypto driver essentially takes ownership of the
> hardware until it is given up explicitly by the administrator.  This
> occurs automatically when the administrator enables the first IAA
> workqueue or disables the last one; the iaa_crypto (sync and async)
> algorithms are registered when the first workqueue is enabled, and
> deregistered when the last one is disabled.
> 
> The normal sequence of operations would normally be: 
> 
>   < configure the hardware using accel-config or sysfs > 
> 
>   < configure the iaa crypto driver (see below) > 
> 
>   < configure the subsystem e.g. zswap/zram to use the iaa_crypto algo >  
> 
>   < run the workload > 
> 
> There are a small number of iaa_crypto driver attributes that the
> administrator can configure, and which also need to be configured
> before the algorithm is enabled:
> 
> compression_mode: 
> 
>   The IAA crypto driver supports an extensible interface supporting
>   any number of different compression modes that can be tailored to
>   specific types of workloads.  These are implemented as tables and
>   given arbitrary names describing their intent.
> 
>   There are currently only 2 compression modes, “canned” and “fixed”.
>   In order to set a compression mode, echo the mode’s name to the
>   compression_mode driver attribute:
>  
>     echo "canned" > /sys/bus/dsa/drivers/crypto/compression_mode
> 
> There are a few other available iaa_crypto driver attributes (see
> Documentation for details) but the main one we want to consider in
> detail for now is the ‘sync_mode’ attribute.
> 
> The ‘sync_mode’ attribute has 3 possible settings: ‘sync’, ‘async’,
> and ‘async_irq’.
> 
> The context for these different modes is that although the iaa_crypto
> driver implements the asynchronous crypto interface, the async
> interface is currently only used in a synchronous way by facilities
> like zswap that make use of it.
> 
> This is fine for software compress/decompress algorithms, since
> there’s no real benefit in being able to use a truly asynchronous
> interface with them.  This isn’t the case, though, for hardware
> compress/decompress engines such as IAA, where truly asynchronous
> behavior is beneficial if not completely necessary to make optimal use
> of the hardware.
> 
> The IAA crypto driver ‘sync_mode’ support should allow facilities such
> as zswap to ‘support async (de)compression in some way [2]’ once
> they are modified to actually make use of it.
> 
> When the ‘async_irq’ sync_mode is specified, the driver sets the bits
> in the IAA work descriptor to generate an irq when the work completes.
> So for every compression or decompression, the IAA acomp_alg
> implementations called by crypto_acomp_compress/decompress() simply
> set up the descriptor, turn on the 'request irq' bit and return
> immediately with -EINPROGRESS.  When the work completes, the irq fires
> and the IDXD driver’s irq thread for that irq invokes the callback the
> iaa_crypto module registered with IDXD.  When the irq thread gets
> scheduled, it wakes up the caller, which could be for instance zswap,
> waiting synchronously via crypto_wait_req().
> 
> Using the simple madvise test program in '[PATCH 08/15] crypto: iaa -
> Add IAA Compression Accelerator Documentation' along with a set of
> pages from the spec17 benchmark and tracepoint instrumentation
> measuring the time taken between the start and end of each compress
> and decompress, this case, async_irq, takes on average 6,847 ns for
> compression and 5,840 ns for decompression. (See Table 1 below for a
> summary of all the tests.)
> 
> When sync_mode is set to ‘sync’, the interrupt bit is not set and the
> work descriptor is submitted in the same way it was for the previous
> case.  In this case the call doesn’t return but rather loops around
> waiting in the iaa_crypto driver’s check_completion() function which
> continually checks the descriptor’s completion bit until it finds it
> set to ‘completed’.  It then returns to the caller, again for example
> zswap waiting in crypto_wait_req().  From the standpoint of zswap,
> this case is exactly the same as the previous case, the difference
> seen only in the crypto layer and the iaa_crypto driver internally;
> from its standpoint they’re both synchronous calls.  There is however
> a large performance difference: an average of 3,177 ns for compress
> and 2,235 ns for decompress.
> 
> The final sync_mode is ‘async’.  In this case also the interrupt bit
> is not set and the work descriptor is submitted, returning immediately
> to the caller with -EINPROGRESS.  Because there’s no interrupt set to
> notify anyone when the work completes, the caller needs to somehow
> check for work completion.  Because core code like zswap can’t do this
> directly by for example calling iaa_crypto’s check_completion(), there
> would need to be some changes made to code like zswap and the crypto
> layer in order to take advantage of this mode.  As such, there are no
> numbers to share for this mode.
> 
> Finally, just a quick discussion of the remaining numbers in Table 1,
> those comparing the iaa_crypto sync and async irq cases to software
> deflate.  Software deflate took average of 108,978 ns for compress and
> 14,485 ns for decompress.
> 
> As can be seen from Table 1, the numbers using the iaa_crypto driver
> for deflate as compared to software are so much better that merging it
> would seem to make sense on its own merits.  The 'async' sync_mode
> described above, however, offers the possibility of even greater gains
> to be had against higher-performing algorithms such as lzo, via
> parallelization, once the calling facilities are modified to take
> advantage of it.  Follow-up patchsets to this one will demonstrate
> concretely how that might be accomplished.
> 
> Thanks, 
> 
> Tom  
> 
> 
>   Table 1. Zswap latency and compression numbers (in ns): 
> 
>   Algorithm                    compress      decompress
>   ----------------------------------------------------------
>   iaa sync			3,177		2,235
>   iaa async irq   		6,847		5,840
>   software deflate	      108,978	       14,485
> 
> [1] https://cdrdv2.intel.com/v1/dl/getContent/721858
> 
> [2] https://lore.kernel.org/lkml/20201107065332.26992-1-song.bao.hua@hisilicon.com/
> 
> 
> Dave Jiang (1):
>   dmaengine: idxd: add external module driver support for dsa_bus_type
> 
> Tom Zanussi (13):
>   dmaengine: idxd: Rename drv_enable/disable_wq to
>     idxd_drv_enable/disable_wq, and export
>   dmaengine: idxd: Export descriptor management functions
>   dmaengine: idxd: Export wq resource management functions
>   dmaengine: idxd: Add wq private data accessors
>   dmaengine: idxd: add callback support for iaa crypto
>   crypto: iaa - Add IAA Compression Accelerator Documentation
>   crypto: iaa - Add Intel IAA Compression Accelerator crypto driver core
>   crypto: iaa - Add per-cpu workqueue table with rebalancing
>   crypto: iaa - Add compression mode management along with fixed mode
>   crypto: iaa - Add support for deflate-iaa compression algorithm
>   crypto: iaa - Add irq support for the crypto async interface
>   crypto: iaa - Add IAA Compression Accelerator stats
>   dmaengine: idxd: Add support for device/wq defaults
> 
>  .../driver-api/crypto/iaa/iaa-crypto.rst      |  824 +++++++
>  Documentation/driver-api/crypto/iaa/index.rst |   20 +
>  Documentation/driver-api/crypto/index.rst     |   20 +
>  Documentation/driver-api/index.rst            |    1 +
>  MAINTAINERS                                   |    7 +
>  crypto/testmgr.c                              |   10 +
>  drivers/crypto/intel/Kconfig                  |    1 +
>  drivers/crypto/intel/Makefile                 |    1 +
>  drivers/crypto/intel/iaa/Kconfig              |   19 +
>  drivers/crypto/intel/iaa/Makefile             |   12 +
>  drivers/crypto/intel/iaa/iaa_crypto.h         |  173 ++
>  .../crypto/intel/iaa/iaa_crypto_comp_fixed.c  |   92 +
>  drivers/crypto/intel/iaa/iaa_crypto_main.c    | 2182 +++++++++++++++++
>  drivers/crypto/intel/iaa/iaa_crypto_stats.c   |  313 +++
>  drivers/crypto/intel/iaa/iaa_crypto_stats.h   |   53 +
>  drivers/dma/idxd/Makefile                     |    2 +-
>  drivers/dma/idxd/bus.c                        |    6 +
>  drivers/dma/idxd/cdev.c                       |    6 +-
>  drivers/dma/idxd/defaults.c                   |   53 +
>  drivers/dma/idxd/device.c                     |   13 +-
>  drivers/dma/idxd/dma.c                        |    9 +-
>  drivers/dma/idxd/idxd.h                       |   83 +-
>  drivers/dma/idxd/init.c                       |    7 +
>  drivers/dma/idxd/irq.c                        |   12 +-
>  drivers/dma/idxd/submit.c                     |    9 +-
>  25 files changed, 3897 insertions(+), 31 deletions(-)
>  create mode 100644 Documentation/driver-api/crypto/iaa/iaa-crypto.rst
>  create mode 100644 Documentation/driver-api/crypto/iaa/index.rst
>  create mode 100644 Documentation/driver-api/crypto/index.rst
>  create mode 100644 drivers/crypto/intel/iaa/Kconfig
>  create mode 100644 drivers/crypto/intel/iaa/Makefile
>  create mode 100644 drivers/crypto/intel/iaa/iaa_crypto.h
>  create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_comp_fixed.c
>  create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_main.c
>  create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_stats.c
>  create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_stats.h
>  create mode 100644 drivers/dma/idxd/defaults.c
> 
> -- 
> 2.34.1

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ