linux-kernel - Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230516083942.0303b5fb@meshulam.tesarici.cz>
Date:   Tue, 16 May 2023 08:39:42 +0200
From:   Petr Tesařík <petr@...arici.cz>
To:     Christoph Hellwig <hch@....de>
Cc:     "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Petr Tesarik <petrtesarik@...weicloud.com>,
        Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Borislav Petkov <bp@...e.de>,
        Randy Dunlap <rdunlap@...radead.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Kim Phillips <kim.phillips@....com>,
        "Steven Rostedt (Google)" <rostedt@...dmis.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Hans de Goede <hdegoede@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:DRM DRIVERS" <dri-devel@...ts.freedesktop.org>,
        "open list:DMA MAPPING HELPERS" <iommu@...ts.linux.dev>,
        Roberto Sassu <roberto.sassu@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce
 buffers

Hi Christoph,

On Tue, 16 May 2023 08:13:09 +0200
Christoph Hellwig <hch@....de> wrote:

> On Mon, May 15, 2023 at 07:43:52PM +0000, Michael Kelley (LINUX) wrote:
> > FWIW, I don't think the approach you have implemented here will be
> > practical to use for CoCo VMs (SEV, TDX, whatever else).  The problem
> > is that dma_direct_alloc_pages() and dma_direct_free_pages() must
> > call dma_set_decrypted() and dma_set_encrypted(), respectively.  In CoCo
> > VMs, these calls are expensive because they require a hypercall to the host,
> > and the operation on the host isn't trivial either.  I haven't measured the
> > overhead, but doing a hypercall on every DMA map operation and on
> > every unmap operation has long been something we thought we must
> > avoid.  The fixed swiotlb bounce buffer space solves this problem by
> > doing set_decrypted() in batch at boot time, and never
> > doing set_encrypted().  
> 
> I also suspect it doesn't really scale too well due to the number of
> allocations.  I suspect a better way to implement things would be to
> add more large chunks that are used just like the main swiotlb buffers.
> 
> That is when we run out of space try to allocate another chunk of the
> same size in the background, similar to what we do with the pool in
> dma-pool.c.  This means we'll do a fairly large allocation, so we'll
> need compaction or even CMA to back it up, but the other big upside
> is that it also reduces the number of buffers that need to be checked
> in is_swiotlb_buffer or the free / sync side.

I have considered this approach. The two main issues I ran into were:

1. MAX_ORDER allocations were too small (at least with 4K pages), and
   even then they would often fail.

2. Allocating from CMA did work but only from process context.
   I made a stab at modifying the CMA allocator to work from interrupt
   context, but there are non-trivial interactions with the buddy
   allocator. Making them safe from interrupt context looked like a
   major task.

I also had some fears about the length of the dynamic buffer list. I
observed maximum length for block devices, and then it roughly followed
the queue depth. Walking a few hundred buffers was still fast enough.
I admit the list length may become an issue with high-end NVMe and
I/O-intensive applications.

Last but not least, when many smaller swiotlb chunks are allocated, they
must be kept in a list (or another data structure), somewhat reducing the
performance win. OK, one thing I did *not* consider back then was
allocating these additional swiotlb chunks per device. It looks a bit
too wasteful.

Petr T