lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 May 2020 23:48:07 -0700
From:   Saravana Kannan <saravanak@...gle.com>
To:     Marek Szyprowski <m.szyprowski@...sung.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Rob Herring <robh+dt@...nel.org>,
        Frank Rowand <frowand.list@...il.com>,
        Len Brown <lenb@...nel.org>,
        Android Kernel Team <kernel-team@...roid.com>,
        LKML <linux-kernel@...r.kernel.org>,
        "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS" 
        <devicetree@...r.kernel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
        Ji Luo <ji.luo@....com>,
        Linux Samsung SOC <linux-samsung-soc@...r.kernel.org>
Subject: Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all
 top level devices

On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
<m.szyprowski@...sung.com> wrote:
>
> Hi Saravana,
>
> On 15.05.2020 07:35, Saravana Kannan wrote:
> > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > parsing of the device tree nodes when a lot of devices are added. This
> > will significantly cut down parsing time (as much a 1 second on some
> > systems). So, use them when adding devices for all the top level device
> > tree nodes in a system.
> >
> > Signed-off-by: Saravana Kannan <saravanak@...gle.com>
>
> This patch recently landed in linux-next 20200518. Sadly, it causes
> regression on Samsung Exynos5433-based TM2e board:
>
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
>
> Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 50 Comm: kworker/0:1 Not tainted 5.7.0-rc5+ #701
> Hardware name: Samsung TM2E board (DT)
> Workqueue: events deferred_probe_work_func
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pc : samsung_i2s_probe+0x768/0x8f0
> lr : samsung_i2s_probe+0x688/0x8f0
> ...
> Call trace:
>   samsung_i2s_probe+0x768/0x8f0
>   platform_drv_probe+0x50/0xa8
>   really_probe+0x108/0x370
>   driver_probe_device+0x54/0xb8
>   __device_attach_driver+0x90/0xc0
>   bus_for_each_drv+0x70/0xc8
>   __device_attach+0xdc/0x140
>   device_initial_probe+0x10/0x18
>   bus_probe_device+0x94/0xa0
>   deferred_probe_work_func+0x70/0xa8
>   process_one_work+0x2a8/0x718
>   worker_thread+0x48/0x470
>   kthread+0x134/0x160
>   ret_from_fork+0x10/0x1c
> Code: 17ffffaf d503201f f94086c0 91003000 (88dffc00)
> ---[ end trace ccf721c9400ddbd6 ]---
> Kernel panic - not syncing: Fatal exception
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x090002,24006087
> Memory Limit: none
>
> ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Both issues, the lack of DMA for SPI device and Synchronous abort in I2S
> probe are new after applying this patch. I'm trying to investigate which
> resources are missing and why. The latter issue means typically that the
> registers for the given device has been accessed without enabling the
> needed clocks or power domains.

Did you try this copy-pasta fix that I sent later?
https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/

Not every system would need it (my test setup didn't), but it helps some cases.

If that fix doesn't help, then some tips for debugging the failing drivers.
What this pause/resume patch effectively (not explicitly) does is:
1. Doesn't immediately probe the devices as they are added in
of_platform_default_populate_init()
2. Adds them in order to the deferred probe list.
3. Then kicks off deferred probe on them in the order they were added.

These drivers are just not handling -EPROBE_DEFER correctly or
assuming probe order and that's causing these issues.

So, we can either fix that or you can try adding some code to flush
the deferred probe workqueue at the end of fw_devlink_resume().

Let me know how it goes.

Thanks,
Saravana

Powered by blists - more mailing lists