[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <49b99d5a87f349bf9ede2a5f737e4f1981a12441.camel@suse.de>
Date: Tue, 08 Sep 2020 14:15:57 +0200
From: Nicolas Saenz Julienne <nsaenzjulienne@...e.de>
To: Catalin Marinas <catalin.marinas@....com>
Cc: linux-kernel@...r.kernel.org, f.fainelli@...il.com, hch@....de,
linux-rpi-kernel@...ts.infradead.org,
Will Deacon <will@...nel.org>,
linux-arm-kernel@...ts.infradead.org, robh@...nel.org,
Robin Murphy <robin.murphy@....com>
Subject: Re: [RFC] arm64: mm: Do not use both DMA zones when 30-bit address
space unavailable
Hi Catalin, thanks for taking the time.
On Tue, 2020-09-08 at 12:14 +0100, Catalin Marinas wrote:
> > Also note the usage of 'zone_dma_bits' in the DMA code, which assumes that
> > ZONE_DMA's physical address space is always smaller than (1 << zone_dma_bits) -
> > 1.
>
> I think part of those uses are broken. dma_direct_supported() does the
> right thing and uses the DMA address instead of the physical one. Here
> __phys_to_dma() subtracts the dma_pfn_offset, which in my above example
> would be (0b10 << (30 - PAGE_SHIFT)).
>
> dma_direct_optimal_gfp_mask(), OTOH, seems to start ok with a
> __dma_to_phys() on the dma_limit but it ends up comparing the physical
> address with the DMA mask. This gives the wrong result on arm64
> platforms where RAM starts above 4GB and still expect a ZONE_DMA32. It
> should compare *phys_limit with __dma_to_phys(DMA_BIT_MASK(...)). I
> guess it ends up bouncing via swiotlb more often.
I'll look into this.
> We assumed such offsets on arm64 since commit d50314a6b070 ("arm64:
> Create non-empty ZONE_DMA when DRAM starts above 4GB").
>
> > > An alternative (and I think we had a patch at some point) is to make it
> > > generic and parse the dma-range in the DT to identify the minimum mask
> > > and set ZONE_DMA accordingly. But this doesn't solve ACPI, so if Linux
> > > can boot with ACPI on RPi4 it would still be broken.
> >
> > ACPI is being worked on by, among others, Jeremy Linton (one of your colleagues
> > I believe).
> >
> > We could always use sane defaults for ACPI and be smarter with DT. Yet,
> > implementing this entails translating nested dma-ranges with the only help of
> > libfdt, which isn't trivial (see devices/of/address.c). IIRC RobH said that it
> > wasn't worth the effort just for a board.
>
> That would have been the ideal, more generic solution. But I agree that
> it's not worth the effort if the only SoC affected is RPi4.
>
> To summarise, I'd like ZONE_DMA to overlap with ZONE_DMA32 (i.e. expand
> zone_dma_bits to 32 and drop ZONE_DMA32) for all SoCs other than RPi4.
> The solutions so far:
>
> 1. Assume that, if RAM starts at 0, we need a zone_dma_bits == 30. This
> also assumes that it's only RPi4 in this category or that any such
> future SoC has a need for 30-bit DMA.
>
> 2. Adjust zone_dma_bits at boot-time only if the SoC is RPi4.
>
> 3. Use the more complex dma-ranges approach to calculate the correct
> zone_dma_bits as a minimum of all dma masks in the DT.
>
> We can discount (3) as not worth the effort. I'd go with (1) (this
> patch) if we can guarantee that no current or future SoC has RAM
> starting at 0 while not needing 30-bit DMA masks. If not, we can go with
> (2) unless others have a better suggestion.
After a quick check at the devices we have for testing at suse it's clear that
(1) is impossible. So I'll push for solution (2).
Regards,
Nicolas
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists