[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87jz73v1th.fsf@bootlin.com>
Date: Tue, 29 Apr 2025 09:55:54 +0200
From: Miquel Raynal <miquel.raynal@...tlin.com>
To: Liwei Song <liwei.song.lsong@...il.com>
Cc: RichardWeinberger <richard@....at>, VigneshRaghavendra
<vigneshr@...com>, TudorAmbarus <tudor.ambarus@...aro.org>,
PratyushYadav <pratyush@...nel.org>, MichaelWalle <mwalle@...nel.org>,
linux-mtd@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mtd: core: add sync between read/write and unbind
device
Hello Liwei,
On 01/04/2025 at 00:15:20 +08, Liwei Song <liwei.song.lsong@...il.com> wrote:
> When unbind mtd device or qspi controller with a high frequency
> reading to /dev/mtd0 device, there will be Calltrace as below:
>
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo ff8d2000.spi > /sys/bus/platform/drivers/cadence-qspi/unbind
>
> Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 UID: 0 PID: 466 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pc : cqspi_indirect_read_execute.isra.0+0x188/0x330
> lr : cqspi_indirect_read_execute.isra.0+0x21c/0x330
> Call trace:
> cqspi_indirect_read_execute.isra.0+0x188/0x330 (P)
> cqspi_exec_mem_op+0x8bc/0xe40
> spi_mem_exec_op+0x3e0/0x478
> spi_mem_no_dirmap_read+0xa8/0xc8
> spi_mem_dirmap_read+0xdc/0x150
> spi_nor_read_data+0x120/0x198
> spi_nor_read+0xf0/0x280
> mtd_read_oob_std+0x80/0x98
> mtd_read_oob+0x9c/0x168
> mtd_read+0x6c/0xd8
> mtdchar_read+0xdc/0x288
> vfs_read+0xc8/0x2f8
> ksys_read+0x70/0x110
> __arm64_sys_read+0x24/0x38
> invoke_syscall+0x5c/0x130
> el0_svc_common.constprop.0+0x48/0xf8
> do_el0_svc+0x28/0x40
> el0_svc+0x30/0xd0
> el0t_64_sync_handler+0x144/0x168
> el0t_64_sync+0x198/0x1a0
> Code: 927e7442 aa1a03e0 8b020342 d503201f (b9400321)
> ---[ end trace 0000000000000000 ]---
>
> Or:
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo spi0.0 > /sys/class/mtd/mtd0/device/driver/unbind
>
> Unable to handle kernel paging request at virtual address 00000000000012e8
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 2 UID: 0 PID: 459 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : spi_mem_exec_op+0x3e8/0x478
> lr : spi_mem_exec_op+0x3e0/0x478
> Call trace:
> spi_mem_exec_op+0x3e8/0x478 (P)
> spi_mem_no_dirmap_read+0xa8/0xc8
> spi_mem_dirmap_read+0xdc/0x150
> spi_nor_read_data+0x120/0x198
> spi_nor_read+0xf0/0x280
> mtd_read_oob_std+0x80/0x98
> mtd_read_oob+0x9c/0x168
> mtd_read+0x6c/0xd8
> mtdchar_read+0xdc/0x288
> vfs_read+0xc8/0x2f8
> ksys_read+0x70/0x110
> __arm64_sys_read+0x24/0x38
> invoke_syscall+0x5c/0x130
> el0_svc_common.constprop.0+0x48/0xf8
> do_el0_svc+0x28/0x40
> el0_svc+0x30/0xd0
> el0t_64_sync_handler+0x144/0x168
> el0t_64_sync+0x198/0x1a0
> Code: f9400842 d63f0040 2a0003f4 f94002a1 (f9417437)
> ---[ end trace 0000000000000000 ]---
>
> when unbind is running, the memory allocated to qspi controller and
> mtd device is freed during unbinding, but open/close and reading device
> are still running, if the reading process get read lock and start
> excuting, there will be above illegal memory access. This issue also
> can be repruduced on many other platforms like ls1046 and nxpimx8 which
> have qspi flash.
>
> In this patch, register a spi bus notifier which will be called before
> unbind process freeing device memory, add a new member mtd_event_remove
> to block mtd open/read, then waiting for the running task to be finished,
> after that, memory is safe to be free.
>
> Signed-off-by: Liwei Song <liwei.song.lsong@...il.com>
> ---
>
> Hi Maintainer,
>
> This is an improved patch compared with the original one:
> (https://patchwork.ozlabs.org/project/linux-mtd/patch/20250325133954.3699535-1-liwei.song.lsong@gmail.com/),
> This v2 patch move notifier to spi-nor to avoid crash other types of flash.
> now this patch only aim at fixing nor-flash "bind/unbind while reading" calltrace,
> but for other types of flash like nand also have this issue.
While I agree with the observation and also the conclusion of adding
some kind of notifier, I'd like to understand the rationale behind
choosing to fix only spi-nor in v2? If any spi memory registered in the
mtd subsystem is subject to this failure, we should find a generic
approach (or if it's too difficult, at least have the fix in both
spi nor and spi nand). Looking at your implementation, maybe it could
fit in spi-mem (I'm not sure).
...
> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> + unsigned long event, void
> *data);
I believe spi nor maitainers would prefer to avoid forward declarations.
> +
> /**
> * spi_nor_get_cmd_ext() - Get the command opcode extension based on the
> * extension type.
> @@ -1191,6 +1195,9 @@ static int spi_nor_prep(struct spi_nor *nor)
> if (nor->controller_ops && nor->controller_ops->prepare)
> ret = nor->controller_ops->prepare(nor);
>
> + if (nor->mtd.mtd_event_remove)
> + return -ENODEV;
> +
> return ret;
> }
...
> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> + unsigned long event, void *data)
> +{
> + struct device *dev = data;
> + struct spi_device *spi;
> + struct spi_mem *mem;
> + struct spi_nor *nor;
> +
> + if (!of_match_device(spi_nor_of_table, dev))
> + return 0;
> +
> + switch (event) {
> + case BUS_NOTIFY_DEL_DEVICE:
> + case BUS_NOTIFY_UNBIND_DRIVER:
> + spi = to_spi_device(dev);
> + mem = spi_get_drvdata(spi);
> + if (!mem)
> + return NOTIFY_DONE;
> + nor = spi_mem_get_drvdata(mem);
> +
> + mutex_lock(&nor->lock);
> + nor->mtd.mtd_event_remove = true;
> + mutex_unlock(&nor->lock);
> + msleep(300);
What is this sleep for?
> +
> + break;
> + }
> +
> + return NOTIFY_DONE;
> +}
> +
> /*
> * REVISIT: many of these chips have deep power-down modes, which
> * should clearly be entered on suspend() to minimize power use.
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 8d10d9d2e830..134bfa6fcf76 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -290,6 +290,7 @@ struct mtd_info {
> /* Kernel-only stuff starts here. */
> const char *name;
> int index;
> + bool mtd_event_remove;
No need to repeat 'mtd' here, you are already in the mtd_info structure,
so mtd->mtd_event_remove would be redundant.
> /* OOB layout description */
> const struct mtd_ooblayout_ops *ooblayout;
Thanks,
Miquèl
Powered by blists - more mailing lists