lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=VB-ML0=kmN-yAH7sD-sMiX9Cq1_9B9QwVL=Ed5ZrGnbg@mail.gmail.com>
Date:   Sat, 3 Oct 2020 09:27:56 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Michael Walle <michael@...le.cc>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mtd@...ts.infradead.org,
        miquel.raynal@...tlin.com, richard@....at,
        tudor.ambarus@...rochip.com, vigneshr@...com
Subject: Re: [PATCH] mtd: spi-nor: Prefer asynchronous probe

Hi,

On Sat, Oct 3, 2020 at 8:22 AM Michael Walle <michael@...le.cc> wrote:
>
> Hi Douglas,
>
> > On my system the spi_nor_probe() took ~6 ms at bootup.  That's not a
> > lot, but every little bit adds up to a slow bootup.  While we can get
> > this out of the boot path by making it a module, there are times where
> > it is convenient (or even required) for this to be builtin the kernel.
> > Let's set that we prefer async probe so that we don't block other
> > drivers from probing while we are probing.
> >
> > This is a tiny little change that is almost guaranteed to be safe for
> > anything that is able to run as a module, which SPI_NOR is.
> > Specifically modules are already probed asynchronously.  Also: since
> > other things in the system may have enabled asynchronous probe the
> > system may already be doing other things during our probe.
> >
> > There is a small possibility that some other driver that was a client
> > of SPI_NOR didn't handle -EPROBE_DEFER and was relying on probe
> > ordering and only worked when the SPI_NOR and the SPI bus were
> > builtin.  In that case the other driver has a bug that's waiting to
> > hit and the other driver should be fixed.
>
> linux-next now triggers the following warning in kernel/kmod.c:136 on my
> board. I've bisected this to this patch.
>
> kmod.c:
>         /*
>          * We don't allow synchronous module loading from async.  Module
>          * init may invoke async_synchronize_full() which will end up
>          * waiting for this task which already is waiting for the module
>          * loading to complete, leading to a deadlock.
>          */
>         WARN_ON_ONCE(wait && current_is_async());
>
> [    1.849801] ------------[ cut here ]------------
> [    1.854271] mscc_felix 0000:00:00.5: device is disabled, skipping
> [    1.858753] WARNING: CPU: 1 PID: 7 at kernel/kmod.c:136 __request_module+0x3a4/0x568
> [    1.858755] Modules linked in:
> [    1.865028] fsl_enetc 0000:00:00.0: Adding to iommu group 1
> [    1.872640] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted 5.9.0-rc6-00001-g03edda0e1eda #113
> [    1.872642] Hardware name: Kontron SMARC-sAL28 (Single PHY) on SMARC Eval 2.0 carrier (DT)
> [    1.872647] Workqueue: events_unbound async_run_entry_fn
> [    1.876013] spi-nor spi0.0: w25q32dw (4096 Kbytes)
> [    1.881294] pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
> [    1.881297] pc : __request_module+0x3a4/0x568
> [    1.881299] lr : __request_module+0x39c/0x568
> [    1.881302] sp : ffff8000113a3920
> [    1.925739] x29: ffff8000113a3920 x28: ffff800010c7b000
> [    1.931068] x27: ffff00207ae05648 x26: ffff800010a41a88
> [    1.936397] x25: 0000000000000000 x24: 0000000000000000
> [    1.941727] x23: ffff800010c35140 x22: 0000000000000001
> [    1.947055] x21: ffff800011149948 x20: ffff800010615bdc
> [    1.952383] x19: 00000000ffffffff x18: 0000000000000000
> [    1.957447] fsl_enetc 0000:00:00.0: enabling device (0400 -> 0402)
> [    1.957711] x17: ffff800010a3e618 x16: ffff800010a3e5f8
> [    1.964175] libphy: Freescale ENETC MDIO Bus: probed
> [    1.969238] x15: ffffffffffffffff x14: ffff800011149948
> [    1.969241] x13: ffff8000113a3918 x12: 0000000000000018
> [    1.969245] x11: 0000000000000005 x10: 0101010101010101
> [    1.975241] 10 fixed-partitions partitions found on MTD device 20c0000.spi
> [    1.979550] x9 : ffff80001005f6a4 x8 : 0000000000000000
> [    1.979553] x7 : 606f2c6364776865 x6 : 05041c090d431511
> [    1.979556] x5 : 1115430d091c0405 x4 : 0000000000000000
> [    1.979558] x3 : 6dac8d8d2dccae00 x2 : ffff800010c956e8
> [    1.979561] x1 : ffff80001005fa58 x0 : 0000000000000001
> [    1.979564] Call trace:
> [    1.979571]  __request_module+0x3a4/0x568
> [    1.984914] Creating 10 MTD partitions on "20c0000.spi":
> [    1.990227]  parse_mtd_partitions+0x2ec/0x3c0
> [    1.990232]  mtd_device_parse_register+0xdc/0x1c8
> [    1.997133] 0x000000000000-0x000000010000 : "rcw"
> [    2.002454]  spi_nor_probe+0x29c/0x2f0
> [    2.002458]  spi_mem_probe+0x74/0xb0
> [    2.017759] 0x000000010000-0x000000100000 : "failsafe bootloader"
> [    2.018433]  spi_drv_probe+0x88/0xe8
> [    2.018439]  really_probe+0xec/0x3c0
> [    2.033744] 0x000000100000-0x000000140000 : "failsafe DP firmware"
> [    2.035555]  driver_probe_device+0x60/0xc0
> [    2.035559]  __device_attach_driver+0x8c/0xd0
> [    2.040455] 0x000000140000-0x0000001e0000 : "failsafe trusted firmware"
> [    2.044642]  bus_for_each_drv+0x84/0xd8
> [    2.044645]  __device_attach_async_helper+0xc4/0xe8
> [    2.044648]  async_run_entry_fn+0x4c/0x150
> [    2.044653]  process_one_work+0x1f4/0x4b8
> [    2.057751] 0x0000001e0000-0x000000200000 : "reserved"
> [    2.062814]  worker_thread+0x50/0x480
> [    2.062817]  kthread+0x160/0x168
> [    2.062821]  ret_from_fork+0x10/0x34
> [    2.073748] 0x000000200000-0x000000210000 : "configuration store"
> [    2.076185] ---[ end trace 44224cc02e4e53d2 ]---
>
> -michael

Thanks for your report!  My vote would be to revert my patch and then
this would need to be resolved before it could be added back in.
Without doing tons of research, maybe the right answer here is that
mtd_device_parse_register() should be moved into a separate task so
it's not blocking probe?  I probably won't try to tackle this
immediately, but the eventual goal is that async is default, so I
think this would need to be resolved before then.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ