lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87jz73v1th.fsf@bootlin.com>
Date: Tue, 29 Apr 2025 09:55:54 +0200
From: Miquel Raynal <miquel.raynal@...tlin.com>
To: Liwei Song <liwei.song.lsong@...il.com>
Cc: RichardWeinberger <richard@....at>,  VigneshRaghavendra
 <vigneshr@...com>,  TudorAmbarus <tudor.ambarus@...aro.org>,
  PratyushYadav <pratyush@...nel.org>,  MichaelWalle <mwalle@...nel.org>,
  linux-mtd@...ts.infradead.org,  linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mtd: core: add sync between read/write and unbind
 device

Hello Liwei,

On 01/04/2025 at 00:15:20 +08, Liwei Song <liwei.song.lsong@...il.com> wrote:

> When unbind mtd device or qspi controller with a high frequency
> reading to /dev/mtd0 device, there will be Calltrace as below:
>
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo ff8d2000.spi  > /sys/bus/platform/drivers/cadence-qspi/unbind
>
> Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 UID: 0 PID: 466 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pc : cqspi_indirect_read_execute.isra.0+0x188/0x330
> lr : cqspi_indirect_read_execute.isra.0+0x21c/0x330
> Call trace:
>  cqspi_indirect_read_execute.isra.0+0x188/0x330 (P)
>  cqspi_exec_mem_op+0x8bc/0xe40
>  spi_mem_exec_op+0x3e0/0x478
>  spi_mem_no_dirmap_read+0xa8/0xc8
>  spi_mem_dirmap_read+0xdc/0x150
>  spi_nor_read_data+0x120/0x198
>  spi_nor_read+0xf0/0x280
>  mtd_read_oob_std+0x80/0x98
>  mtd_read_oob+0x9c/0x168
>  mtd_read+0x6c/0xd8
>  mtdchar_read+0xdc/0x288
>  vfs_read+0xc8/0x2f8
>  ksys_read+0x70/0x110
>  __arm64_sys_read+0x24/0x38
>  invoke_syscall+0x5c/0x130
>  el0_svc_common.constprop.0+0x48/0xf8
>  do_el0_svc+0x28/0x40
>  el0_svc+0x30/0xd0
>  el0t_64_sync_handler+0x144/0x168
>  el0t_64_sync+0x198/0x1a0
> Code: 927e7442 aa1a03e0 8b020342 d503201f (b9400321)
> ---[ end trace 0000000000000000 ]---
>
> Or:
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo spi0.0 > /sys/class/mtd/mtd0/device/driver/unbind
>
> Unable to handle kernel paging request at virtual address 00000000000012e8
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 2 UID: 0 PID: 459 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : spi_mem_exec_op+0x3e8/0x478
> lr : spi_mem_exec_op+0x3e0/0x478
> Call trace:
>  spi_mem_exec_op+0x3e8/0x478 (P)
>  spi_mem_no_dirmap_read+0xa8/0xc8
>  spi_mem_dirmap_read+0xdc/0x150
>  spi_nor_read_data+0x120/0x198
>  spi_nor_read+0xf0/0x280
>  mtd_read_oob_std+0x80/0x98
>  mtd_read_oob+0x9c/0x168
>  mtd_read+0x6c/0xd8
>  mtdchar_read+0xdc/0x288
>  vfs_read+0xc8/0x2f8
>  ksys_read+0x70/0x110
>  __arm64_sys_read+0x24/0x38
>  invoke_syscall+0x5c/0x130
>  el0_svc_common.constprop.0+0x48/0xf8
>  do_el0_svc+0x28/0x40
>  el0_svc+0x30/0xd0
>  el0t_64_sync_handler+0x144/0x168
>  el0t_64_sync+0x198/0x1a0
> Code: f9400842 d63f0040 2a0003f4 f94002a1 (f9417437)
> ---[ end trace 0000000000000000 ]---
>
> when unbind is running, the memory allocated to qspi controller and
> mtd device is freed during unbinding, but open/close and reading device
> are still running, if the reading process get read lock and start
> excuting, there will be above illegal memory access. This issue also
> can be repruduced on many other platforms like ls1046 and nxpimx8 which
> have qspi flash.
>
> In this patch, register a spi bus notifier which will be called before
> unbind process freeing device memory, add a new member mtd_event_remove
> to block mtd open/read, then waiting for the running task to be finished,
> after that, memory is safe to be free.
>
> Signed-off-by: Liwei Song <liwei.song.lsong@...il.com>
> ---
>
> Hi Maintainer,
>
> This is an improved patch compared with the original one:
> (https://patchwork.ozlabs.org/project/linux-mtd/patch/20250325133954.3699535-1-liwei.song.lsong@gmail.com/),
> This v2 patch move notifier to spi-nor to avoid crash other types of flash.
> now this patch only aim at fixing nor-flash "bind/unbind while reading" calltrace,
> but for other types of flash like nand also have this issue.

While I agree with the observation and also the conclusion of adding
some kind of notifier, I'd like to understand the rationale behind
choosing to fix only spi-nor in v2? If any spi memory registered in the
mtd subsystem is subject to this failure, we should find a generic
approach (or if it's too difficult, at least have the fix in both
spi nor and spi nand). Looking at your implementation, maybe it could
fit in spi-mem (I'm not sure).

...

> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> +					unsigned long event, void
> *data);

I believe spi nor maitainers would prefer to avoid forward declarations.

> +
>  /**
>   * spi_nor_get_cmd_ext() - Get the command opcode extension based on the
>   *			   extension type.
> @@ -1191,6 +1195,9 @@ static int spi_nor_prep(struct spi_nor *nor)
>  	if (nor->controller_ops && nor->controller_ops->prepare)
>  		ret = nor->controller_ops->prepare(nor);
>  
> +	if (nor->mtd.mtd_event_remove)
> +		return -ENODEV;
> +
>  	return ret;
>  }

...

> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> +				    unsigned long event, void *data)
> +{
> +	struct device *dev = data;
> +	struct spi_device *spi;
> +	struct spi_mem *mem;
> +	struct spi_nor *nor;
> +
> +	if (!of_match_device(spi_nor_of_table, dev))
> +		return 0;
> +
> +	switch (event) {
> +	case BUS_NOTIFY_DEL_DEVICE:
> +	case BUS_NOTIFY_UNBIND_DRIVER:
> +		spi = to_spi_device(dev);
> +		mem = spi_get_drvdata(spi);
> +		if (!mem)
> +			return NOTIFY_DONE;
> +		nor = spi_mem_get_drvdata(mem);
> +
> +		mutex_lock(&nor->lock);
> +		nor->mtd.mtd_event_remove = true;
> +		mutex_unlock(&nor->lock);
> +		msleep(300);

What is this sleep for?

> +
> +		break;
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
>  /*
>   * REVISIT: many of these chips have deep power-down modes, which
>   * should clearly be entered on suspend() to minimize power use.
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 8d10d9d2e830..134bfa6fcf76 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -290,6 +290,7 @@ struct mtd_info {
>  	/* Kernel-only stuff starts here. */
>  	const char *name;
>  	int index;
> +	bool mtd_event_remove;

No need to repeat 'mtd' here, you are already in the mtd_info structure,
so mtd->mtd_event_remove would be redundant.

>  	/* OOB layout description */
>  	const struct mtd_ooblayout_ops *ooblayout;

Thanks,
Miquèl

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ