lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <892f9d14-e6fd-7b1b-d07b-af0be6e623fa@arm.com>
Date:   Tue, 31 Jul 2018 17:29:17 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Stefan Agner <stefan@...er.ch>
Cc:     Guenter Roeck <linux@...ck-us.net>, Christoph Hellwig <hch@....de>,
        Krzysztof Kozlowski <krzk@...nel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Rob Herring <robh+dt@...nel.org>,
        Frank Rowand <frowand.list@...il.com>,
        devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
        Fugang Duan <fugang.duan@....com>
Subject: Re: [BUG BISECT] Ethernet fail on VF50 (OF: Don't set default
 coherent DMA mask)

On 31/07/18 16:53, Stefan Agner wrote:
> On 31.07.2018 14:32, Robin Murphy wrote:
>> On 31/07/18 09:19, Stefan Agner wrote:
>>> On 30.07.2018 16:38, Robin Murphy wrote:
>>>> On 28/07/18 17:58, Guenter Roeck wrote:
>>>>> On Fri, Jul 27, 2018 at 04:04:48PM +0200, Christoph Hellwig wrote:
>>>>>> On Fri, Jul 27, 2018 at 03:18:14PM +0200, Krzysztof Kozlowski wrote:
>>>>>>> On 27 July 2018 at 15:11, Krzysztof Kozlowski <krzk@...nel.org> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On today's next, the bisect pointed commit
>>>>>>>> ff33d1030a6ca87cea9a41e1a2ea7750a781ab3d as fault for my boot failures
>>>>>>>> with NFSv4 root on Toradex Colibri VF50 (Iris carrier board).
>>>>>>>>
>>>>>>>> Author: Robin Murphy <robin.murphy@....com>
>>>>>>>> Date:   Mon Jul 23 23:16:12 2018 +0100
>>>>>>>>        OF: Don't set default coherent DMA mask
>>>>>>>>
>>>>>>>> Board: Toradex Colibri VF50 (NXP VF500, Cortex A5, serial configured
>>>>>>>> with DMA) on Iris Carrier.
>>>>>>>>
>>>>>>>> It looks like problem with Freescale Ethernet driver:
>>>>>>>> [   15.458477] fsl-edma 40018000.dma-controller: coherent DMA mask is unset
>>>>>>>> [   15.465284] fsl-lpuart 40027000.serial: Cannot prepare cyclic DMA
>>>>>>>> [   15.472086] Root-NFS: no NFS server address
>>>>>>>> [   15.476359] VFS: Unable to mount root fs via NFS, trying floppy.
>>>>>>>> [   15.484228] VFS: Cannot open root device "nfs" or
>>>>>>>> unknown-block(2,0): error -6
>>>>>>>> [   15.491664] Please append a correct "root=" boot option; here are
>>>>>>>> the available partitions:
>>>>>>>> [   15.500188] 0100           16384 ram0
>>>>>>>> [   15.500200]  (driver?)
>>>>>>>> [   15.506406] Kernel panic - not syncing: VFS: Unable to mount root
>>>>>>>> fs on unknown-block(2,0)
>>>>>>>> [   15.514747] ---[ end Kernel panic - not syncing: VFS: Unable to
>>>>>>>> mount root fs on unknown-block(2,0) ]---
>>>>>>>>
>>>>>>>> Attached - defconfig and full boot log.
>>>>>>>>
>>>>>>>> Any hints?
>>>>>>>> Let me know if you need any more information.
>>>>>>>
>>>>>>> My Exynos boards also fail to boot on missing network:
>>>>>>> https://krzk.eu/#/builders/21/builds/799/steps/10/logs/serial0
>>>>>>>
>>>>>>> As expected there are plenty of "DMA mask not set" warnings... and
>>>>>>> later dwc3 driver fails with:
>>>>>>>        dwc3: probe of 12400000.dwc3 failed with error -12
>>>>>>> which is probably the answer why LAN attached to USB is not present.
>>>>>>
>>>>>> Looks like all the drivers failed to set a dma mask and were lucky.
>>>>>
>>>>> I would call it a serious regression. Also, no longer setting a default
>>>>> coherent DMA mask is a quite substantial behavioral change, especially
>>>>> if and since the code worked just fine up to now.
>>>>
>>>> To reiterate, that particular side-effect was an unintentional
>>>> oversight, and I was simply (un)lucky enough that none of the drivers
>>>> I did test depended on that default mask. Sorry for the blip; please
>>>> check whether it's now fixed in next-20180730 as it should be.
>>>>
>>>
>>> Just for my understanding:
>>>
>>> Your first patch ("OF: Don't set default coherent DMA mask") sounded
>>> like that *not* setting default coherent DMA mask was intentionally.
>>> Since the commit message reads: "...the bus code has not initialised any
>>> default value" that was assuming that all bus code sets a default DMA
>>> mask which wasn't the case for "simple-bus".
>>
>> Yes, reading the patches in the order they were written is perhaps a
>> little unclear, but hopefully the order in which they are now applied
>> makes more sense.
>>
>>> So I guess that is what ("of/platform: Initialise default DMA masks")
>>> makes up for in the typical device tree case ("simple-bus")?
>>
>> Indeed, I'd missed the fact that the now-out-of-place-looking
>> initialisation in of_dma_configure() still actually belonged to
>> of_platform_device_create_pdata() - that patch should make the
>> assumptions of "OF: Don't set default coherent DMA mask" true again,
>> even for OF-platform devices.
>>
>>> Now, since almost all drivers are inside a soc "simple-bus" and DMA mask
>>> is set again, can/should we rely on the coherent DMA mask set?
>>>
>>> Or is the expectation still that this is set on driver level too?
>>
>> Ideally, we'd like all drivers to explicitly request their masks as
>> the documentation in DMA-API-HOWTO.txt recommends, if only to ensure
>> DMA is actually possible - there can be systems where even the default
>> 32-bit mask is no good - but clearly we're a little way off trying to
>> enforce that just yet.
> 
> In the FEC driver case, there is an integrated DMA (uDMA). It has
> alignment restrictions, but can otherwise address the full 32-bit range.
> 
> So something like this should do it right?
> 
> 	if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
> 		dev_warn(dev, "No suitable DMA available\n");
> 		return -ENODEV;
> 	}
> 

Yup, precisely.

> However, that, as far as I understand, still requires that the bus set
> up dma_mask properly.
> 
> Should I be using dma_coerce_mask_and_coherent?

AFAICS for FEC, the ColdFire instances have statically-set masks, the 
i.MX boardfiles get them set via platform+device_register_full(), and 
now that the bug-which-never-should-have-been is fixed the DT-based 
instances should be fine too, so you should be good to go. In general 
I'd say that the dma_coerce_mask*() routines are only really for generic 
interface drivers like *HCI where they don't really know what the 
underlying device is and it may be on any old random bus. Drivers for 
specific IP blocks normally only have one or two known buses to deal 
with, so in most cases it's more reasonable to make the bus code 
well-behaved if it isn't already.

Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ