linux-kernel - RE: [PATCH v3 4/7] swiotlb: if swiotlb is full, fall back to a transient memory pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BYAPR21MB168802F691D3041C9B2F9F2DD72CA@BYAPR21MB1688.namprd21.prod.outlook.com>
Date:   Thu, 6 Jul 2023 14:22:50 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
CC:     Petr Tesarik <petrtesarik@...weicloud.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Juergen Gross <jgross@...e.com>,
        Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
        Christoph Hellwig <hch@....de>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Hans de Goede <hdegoede@...hat.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Kees Cook <keescook@...omium.org>,
        Saravana Kannan <saravanak@...gle.com>,
        "moderated list:XEN HYPERVISOR ARM" <xen-devel@...ts.xenproject.org>,
        "moderated list:ARM PORT" <linux-arm-kernel@...ts.infradead.org>,
        open list <linux-kernel@...r.kernel.org>,
        "open list:MIPS" <linux-mips@...r.kernel.org>,
        "open list:XEN SWIOTLB SUBSYSTEM" <iommu@...ts.linux.dev>,
        Roberto Sassu <roberto.sassu@...weicloud.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        "petr@...arici.cz" <petr@...arici.cz>
Subject: RE: [PATCH v3 4/7] swiotlb: if swiotlb is full, fall back to a
 transient memory pool

From: Greg Kroah-Hartman <gregkh@...uxfoundation.org> Sent: Thursday, July 6, 2023 1:07 AM
> 
> On Thu, Jul 06, 2023 at 03:50:55AM +0000, Michael Kelley (LINUX) wrote:
> > From: Petr Tesarik <petrtesarik@...weicloud.com> Sent: Tuesday, June 27, 2023
> 2:54 AM
> > >
> > > Try to allocate a transient memory pool if no suitable slots can be found,
> > > except when allocating from a restricted pool. The transient pool is just
> > > enough big for this one bounce buffer. It is inserted into a per-device
> > > list of transient memory pools, and it is freed again when the bounce
> > > buffer is unmapped.
> > >
> > > Transient memory pools are kept in an RCU list. A memory barrier is
> > > required after adding a new entry, because any address within a transient
> > > buffer must be immediately recognized as belonging to the SWIOTLB, even if
> > > it is passed to another CPU.
> > >
> > > Deletion does not require any synchronization beyond RCU ordering
> > > guarantees. After a buffer is unmapped, its physical addresses may no
> > > longer be passed to the DMA API, so the memory range of the corresponding
> > > stale entry in the RCU list never matches. If the memory range gets
> > > allocated again, then it happens only after a RCU quiescent state.
> > >
> > > Since bounce buffers can now be allocated from different pools, add a
> > > parameter to swiotlb_alloc_pool() to let the caller know which memory pool
> > > is used. Add swiotlb_find_pool() to find the memory pool corresponding to
> > > an address. This function is now also used by is_swiotlb_buffer(), because
> > > a simple boundary check is no longer sufficient.
> > >
> > > The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
> > > simplified and enhanced to use coherent memory pools if needed.
> > >
> > > Note that this is not the most efficient way to provide a bounce buffer,
> > > but when a DMA buffer can't be mapped, something may (and will) actually
> > > break. At that point it is better to make an allocation, even if it may be
> > > an expensive operation.
> >
> > I continue to think about swiotlb memory management from the standpoint
> > of CoCo VMs that may be quite large with high network and storage loads.
> > These VMs are often running mission-critical workloads that can't tolerate
> > a bounce buffer allocation failure.  To prevent such failures, the swiotlb
> > memory size must be overly large, which wastes memory.
> 
> If "mission critical workloads" are in a vm that allowes overcommit and
> no control over other vms in that same system, then you have worse
> problems, sorry.
> 
> Just don't do that.
> 

No, the cases I'm concerned about don't involve memory overcommit.

CoCo VMs must use swiotlb bounce buffers to do DMA I/O.  Current swiotlb
code in the Linux guest allocates a configurable, but fixed, amount of guest
memory at boot time for this purpose.  But it's hard to know how much
swiotlb bounce buffer memory will be needed to handle peak I/O loads.
This patch set does dynamic allocation of swiotlb bounce buffer memory,
which can help avoid needing to configure an overly large fixed size at boot.

Michael