linux-kernel - Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230517115821.4bf63bf5@meshulam.tesarici.cz>
Date:   Wed, 17 May 2023 11:58:21 +0200
From:   Petr Tesařík <petr@...arici.cz>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Christoph Hellwig <hch@....de>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Petr Tesarik <petrtesarik@...weicloud.com>,
        Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Borislav Petkov <bp@...e.de>,
        Randy Dunlap <rdunlap@...radead.org>,
        Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Kim Phillips <kim.phillips@....com>,
        "Steven Rostedt (Google)" <rostedt@...dmis.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Hans de Goede <hdegoede@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:DRM DRIVERS" <dri-devel@...ts.freedesktop.org>,
        "open list:DMA MAPPING HELPERS" <iommu@...ts.linux.dev>,
        Roberto Sassu <roberto.sassu@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce
 buffers

On Wed, 17 May 2023 10:41:19 +0100
Catalin Marinas <catalin.marinas@....com> wrote:

> On Wed, May 17, 2023 at 08:56:53AM +0200, Christoph Hellwig wrote:
> > Just thinking out loud:
> > 
> >  - what if we always way overallocate the swiotlb buffer
> >  - and then mark the second half / two thirds / <pull some number out
> >    of the thin air> slots as used, and make that region available
> >    through a special CMA mechanism as ZONE_MOVABLE (but not allowing
> >    other CMA allocations to dip into it).
> > 
> > This allows us to have a single slot management for the entire
> > area, but allow reclaiming from it.  We'd probably also need to make
> > this CMA variant irq safe.  
> 
> I think this could work. It doesn't need to be ZONE_MOVABLE (and we
> actually need this buffer in ZONE_DMA). But we can introduce a new
> migrate type, MIGRATE_SWIOTLB, and movable page allocations can use this
> range. The CMA allocations go to free_list[MIGRATE_CMA], so they won't
> overlap.
> 
> One of the downsides is that migrating movable pages still needs a
> sleep-able context.

Pages can be migrated by a separate worker thread when the number of
free slots reaches a low watermark.

> Another potential confusion is is_swiotlb_buffer() for pages in this
> range allocated through the normal page allocator. We may need to check
> the slots as well rather than just the buffer boundaries.

Ah, yes, I forgot about this part; thanks for the reminder.

Indeed, movable pages can be used for the page cache, and drivers do
DMA to/from buffers in the page cache.

Let me recap:

- Allocated chunks must still be tracked with this approach.
- The pool of available slots cannot be grown from interrupt context.

So, what exactly is the advantage compared to allocating additional
swiotlb chunks from CMA?

> (we are actually looking at a MIGRATE_METADATA type for the arm64 memory
> tagging extension which uses a 3% carveout of the RAM for storing the
> tags and people want that reused somehow; we have some WIP patches but
> we'll post them later this summer)
> 
> > This could still be combined with more aggressive use of per-device
> > swiotlb area, which is probably a good idea based on some hints.
> > E.g. device could hint an amount of inflight DMA to the DMA layer,
> > and if there are addressing limitations and the amout is large enough
> > that could cause the allocation of a per-device swiotlb area.  
> 
> If we go for one large-ish per-device buffer for specific cases, maybe
> something similar to the rmem_swiotlb_setup() but which can be
> dynamically allocated at run-time and may live alongside the default
> swiotlb. The advantage is that it uses a similar slot tracking to the
> default swiotlb, no need to invent another. This per-device buffer could
> also be allocated from the MIGRATE_SWIOTLB range if we make it large
> enough at boot. It would be seen just a local accelerator for devices
> that use bouncing frequently or from irq context.

A per-device pool could also be used for small buffers. IIRC somebody
was interested in that.

Petr T