[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1f08bd12-0ac4-43ea-b058-7836521eec12@app.fastmail.com>
Date: Thu, 28 Sep 2023 11:16:47 -0400
From: "Arnd Bergmann" <arnd@...db.de>
To: "Jim Quinlan" <james.quinlan@...adcom.com>
Cc: "Linus Walleij" <linus.walleij@...aro.org>,
"Christoph Hellwig" <hch@....de>,
bcm-kernel-feedback-list@...adcom.com, jim2101024@...il.com,
"Russell King" <linux@...linux.org.uk>,
"Geert Uytterhoeven" <geert+renesas@...der.be>,
"Russell King" <rmk+kernel@...linux.org.uk>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"Jonathan Corbet" <corbet@....net>,
"Thomas Gleixner" <tglx@...utronix.de>,
"Sebastian Reichel" <sebastian.reichel@...labora.com>,
"Mike Rapoport" <rppt@...nel.org>,
"Eric DeVolder" <eric.devolder@...cle.com>,
"Nathan Chancellor" <nathan@...nel.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
"Christophe Leroy" <christophe.leroy@...roup.eu>,
"moderated list:ARM PORT" <linux-arm-kernel@...ts.infradead.org>,
"open list" <linux-kernel@...r.kernel.org>,
"Claire Chang" <tientzu@...omium.org>
Subject: Re: [PATCH v1 1/1] ARM: Select DMA_DIRECT_REMAP to fix restricted DMA
On Thu, Sep 28, 2023, at 10:00, Jim Quinlan wrote:
> On Thu, Sep 28, 2023 at 9:32 AM Arnd Bergmann <arnd@...db.de> wrote:
>>
>> On Thu, Sep 28, 2023, at 08:07, Jim Quinlan wrote:
>> > On Wed, Sep 27, 2023 at 7:10 PM Linus Walleij <linus.walleij@...aro.org> wrote:
>> >>
>> >> Clearly if you want to do this, surely the ARM-specific
>> >> arch/arm/mm/dma-mapping.c and arch/arm/mm/dma-mapping-nommu.c
>> >> needs to be removed at the same time?
>> >
>> >
>> > Yes, this is the reason I used "RFC" as the fix looked too easy to be viable :-)
>> > I debugged it enough to see that the host driver's
>> > writes to the dma_alloc_coherent() region were not appearing in
>> > memory, and that
>> > led me to DMA_DIRECT_REMAP.
>>
>> Usually when you see a mismatch between the data observed by the
>> device and the CPU, the problem is an incorrect "dma-coherent"
>> property in the DT: either the device is coherent and accesses
>> the cache but the CPU tries to bypass it because the property
>> is missing, or there is an extraneous property and the CPU
>> goes the through the cache but the devices bypasses it.
>
> I just searched, there are no "dt-coherent" properties in our device tree.
> Also, even if we did have them, wouldn't things also fail when not using
> restricted DMA?
Correct, it should be independent of restricted DMA, but it might
work by chance that way even if it's still wrong. If your DT
is marked as non-coherent (note: the property to look for
is "dma-coherent", not "dt-coherent"), can you check the
datasheet of the SoC to if that is actually correct?
If the chip is designed to support high-speed devices on
PCIe, it's likely that the PCIe root complex is either coherent
with the caches, or can (and should) be configured that way
for performance reasons.
>> It could also be a driver bug if the device mixes up the
>> address spaces, e.g. passing virt_to_phys(pointer) rather
>> than the DMA address returned by dma_alloc_coherent().
>
> This is an Intel 7260 part using the iwlwifi driver, I doubt it has
> errors of that kind.
It's unlikely but not impossible, as the driver has some
unusual constructs, using a lot of coherent mappings that
might otherwise be streaming mappings, and relying on
dma_sync_single_for_device(..., DMA_BIDIRECTIONAL) for other
data, but without the corresponding dma_sync_single_for_cpu().
If all the testing happens on x86, this might easily lead
to a bug that only shows up on non-coherent systems but
is never seen during testing.
If the problem is not the "dma-coherent" property, can you
double-check if using a different PCIe device works, or narrow
down which specific buffer you saw get corrupted?
Arnd
Powered by blists - more mailing lists