linux-kernel - Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230517093226.77ab1d2a@meshulam.tesarici.cz>
Date:   Wed, 17 May 2023 09:32:26 +0200
From:   Petr Tesařík <petr@...arici.cz>
To:     Christoph Hellwig <hch@....de>
Cc:     Catalin Marinas <catalin.marinas@....com>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        Petr Tesarik <petrtesarik@...weicloud.com>,
        Jonathan Corbet <corbet@....net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Maxime Ripard <mripard@...nel.org>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        David Airlie <airlied@...il.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Borislav Petkov <bp@...e.de>,
        Randy Dunlap <rdunlap@...radead.org>,
        Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        Kim Phillips <kim.phillips@....com>,
        "Steven Rostedt (Google)" <rostedt@...dmis.org>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Hans de Goede <hdegoede@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:DRM DRIVERS" <dri-devel@...ts.freedesktop.org>,
        "open list:DMA MAPPING HELPERS" <iommu@...ts.linux.dev>,
        Roberto Sassu <roberto.sassu@...wei.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Subject: Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce
 buffers

Hi Christoph,

On Wed, 17 May 2023 08:56:53 +0200
Christoph Hellwig <hch@....de> wrote:

> Just thinking out loud:
> 
>  - what if we always way overallocate the swiotlb buffer
>  - and then mark the second half / two thirds / <pull some number out
>    of the thin air> slots as used, and make that region available
>    through a special CMA mechanism as ZONE_MOVABLE (but not allowing
>    other CMA allocations to dip into it).

This approach has also been considered internally at Huawei, and it
looked like a viable option, just more complex. We decided to send the
simple approach first to get some feedback and find out who else might
be interested in the dynamic sizing of swiotlb (if anyone).

> This allows us to have a single slot management for the entire
> area, but allow reclaiming from it.  We'd probably also need to make
> this CMA variant irq safe.

Let me recap my internal analysis.

On the pro side:

- no performance penalty for devices that do not use swiotlb
- all alignment and boundary constraints can be met
- efficient use of memory for buffers smaller than 1 page

On the con side:

- ZONE_MOVABLE cannot be used for most kernel allocations
- competition with CMA over precious physical address space
  (How much should be reserved for CMA and how much for SWIOTLB?)

To quote from Memory hotplug documentation:

Usually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. [...]
Actual safe zone ratios depend on the workload. Extreme cases, like
excessive long-term pinning of pages, might not be able to deal with
ZONE_MOVABLE at all.

This should be no big issue on bare metal (where the motivation is
addressing limitations), but the size of SWIOTLB in CoCo VMs probably
needs some consideration.

> This could still be combined with more aggressive use of per-device
> swiotlb area, which is probably a good idea based on some hints.
> E.g. device could hint an amount of inflight DMA to the DMA layer,
> and if there are addressing limitations and the amout is large enough
> that could cause the allocation of a per-device swiotlb area.

I would not rely on device hints, because it probably depends on
workload rather than type of device. I'd rather implement some logic
based on the actual runtime usage pattern. I have some ideas already.

Petr T