lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231108101347.77cab795@meshulam.tesarici.cz>
Date:   Wed, 8 Nov 2023 10:13:47 +0100
From:   Petr Tesařík <petr@...arici.cz>
To:     Halil Pasic <pasic@...ux.ibm.com>
Cc:     Niklas Schnelle <schnelle@...ux.ibm.com>,
        Christoph Hellwig <hch@....de>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Robin Murphy <robin.murphy@....com>,
        Petr Tesarik <petr.tesarik1@...wei-partners.com>,
        Ross Lagerwall <ross.lagerwall@...rix.com>,
        linux-pci <linux-pci@...r.kernel.org>,
        linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
        Matthew Rosato <mjrosato@...ux.ibm.com>,
        Jianxiong Gao <jxgao@...gle.com>
Subject: Re: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y

On Tue, 7 Nov 2023 18:24:20 +0100
Halil Pasic <pasic@...ux.ibm.com> wrote:

> On Fri, 3 Nov 2023 21:50:53 +0100
> Petr Tesařík <petr@...arici.cz> wrote:
> 
> > > In our opinion the first step towards getting this right is to figure out what
> > > the different kinds of alignments are really supposed to mean. For each of the
> > > mechanisms we need to understand and document, whether making sure that the
> > > bounce buffer does not stretch over more of certain units of memory (like,
> > > pages, iova granule size, whatever), or is it about preserving offset within a
> > > certain unit of memory, and if yes to what extent (the least significant n-bits
> > > of the orig_addr dictated by the respective mask, or something different).    
> > 
> > 
> > Seconded. I have also been struggling with the various alignment
> > constraints. I have even written (but not yet submitted) a patch to
> > calculate the combined alignment mask in swiotlb_tbl_map_single() and
> > pass it down to all other functions, just to make it clear what
> > alignment mask is used.  
> 
> Can you cc me when posting that rework?

Absolutely. I mean, if it still makes sense after we clarify the
intended goals of the various alignment parameters. Because I believe
you have indeed found something!

> > My understanding is that buffer alignment may be required by:
> > 
> > 1. hardware which cannot handle an unaligned base address (presumably
> >    because the chip performs a simple OR operation to get the addresses
> >    of individual fields);  
> 
> I'm not sure I understood this properly. What is "base address" in this
> context? Is for swiotlb "base address" basically the return value
> of swiotlb_tbl_map_single() -- I referred to this as tlb_addr previously?
>
> If yes, I understand that  satisfying 1 means satisfying 
> tlb_addr & combined_mask == 0, where combined_mask describes the
> combined alignment requirement (i.e. combined_mask == min_align_mask |
> alloc_align_mask | (alloc_size < PAGE_SIZE ? 0 : (PAGE_SIZE-1)). Does
> that sound about right?
> 
> Can we assume that if 1. then the start address of the mapping
> that is orig_addr needs to be already aligned?


No, not really. A very nice diagram can be found in commit 5f89468e2f06
("swiotlb: manipulate orig_addr when tlb_addr has offset"):

"""
  1. Get dma_addr_t from dma_map_single()

  dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

      |<---------------vsize------------->|
      +-----------------------------------+
      |                                   | original buffer
      +-----------------------------------+
    vaddr

   swiotlb_align_offset
       |<----->|<---------------vsize------------->|
       +-------+-----------------------------------+
       |       |                                   | swiotlb buffer
       +-------+-----------------------------------+
	    tlb_addr
"""

Here, the aligned address is outside the original buffer at
[vadddr; vaddr+vsize). This is what I referred to as "base
address". The N lowest bits of this address are zero. It may not
even be mapped in the SWIOTLB if N is greater than IO_TLB_SHIFT.
However, the exact values of the N lowest bits of the original
buffer's physical start address are preserved in tlb_addr.

> > 2. isolation of untrusted devices, where no two bounce buffers should
> >    end up in the same iova granule;
> > 
> > 3. allocation size; I could not find an explanation, so this might be
> >    merely an attempt at reducing SWIOTLB internal fragmentation.  
> 
> 
> Assumed I understood 1 correctly, I think we are missing something.
> 
> 4. preserve the n (0 <= n <= 31) lowest bits of all addresses within the
> mapping.
> 
> Was it just 1, 2 and 3 then we wouldn't need the whole offsetting
> business introduced by commit 1f221a0d0dbf ("swiotlb: respect
> min_align_mask"). Let me cite from its commit message:
> 
> """
>     Respect the min_align_mask in struct device_dma_parameters in swiotlb.
>     
>     There are two parts to it:
>      1) for the lower bits of the alignment inside the io tlb slot, just
>         extent the size of the allocation and leave the start of the slot
>          empty
>      2) for the high bits ensure we find a slot that matches the high bits
>         of the alignment to avoid wasting too much memory
>     
>     Based on an earlier patch from Jianxiong Gao <jxgao@...gle.com>.
> """
> 
> Do we agree, that 4. needs to be added to the list? Or was it supposed
> to be covered by 1.?

That's it. It's what case 1 is supposed to be. However, IIUC cases
2 and 3 don't need to preserve any lowest bits.

At least for case 3, I'm now quite confident that the intention
was to start big buffers on a page-aligned slot, leaving the gaps
for buffers smaller than a page.

Case 2 is not clear to me. Some comments suggest that it should
prevent exposing a single iova granule to multiple untrusted
devices. What the code really does now is prevent crossing an iova
granule boundary if the original buffer did not cross one. I'm not
sure whether it achieves the goal, because commit e81e99bacc9f
("swiotlb: Support aligned swiotlb buffers") also references
PAGE_SIZE, but AFAICS it should use the same logic as case 3
(page-aligned allocations).

To sum it up, there are two types of alignment:

1. specified by a device's min_align_mask; this says how many low
   bits of a buffer's physical address must be preserved,

2. specified by allocation size and/or the alignment parameter;
   this says how many low bits in the first IO TLB slot's physical
   address must be zero.

I hope somebody can confirm or correct this summary before I go
and break something. You know, it's not like cleanups in SWIOTLB
have never broken anything.  ;-)

Petr T

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ