[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff3dc53b-bca0-f8d3-f313-e7cc22e1bfc3@quicinc.com>
Date: Mon, 26 May 2025 11:26:31 +0530
From: Md Sadre Alam <quic_mdalam@...cinc.com>
To: Gabor Juhos <j4g8y7@...il.com>, Mark Brown <broonie@...nel.org>,
Varadarajan Narayanan <quic_varada@...cinc.com>,
Sricharan Ramabadhran
<quic_srichara@...cinc.com>,
Miquel Raynal <miquel.raynal@...tlin.com>,
Richard Weinberger <richard@....at>,
Vignesh Raghavendra <vigneshr@...com>
CC: <linux-spi@...r.kernel.org>, <linux-mtd@...ts.infradead.org>,
<linux-arm-msm@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] spi: spi-qpic-snand: reallocate BAM transactions
Hi,
On 5/25/2025 10:35 PM, Gabor Juhos wrote:
> Using the mtd_nandbiterrs module for testing the driver occasionally
> results in weird things like below.
>
> 1. swiotlb mapping fails with the following message:
>
> [ 85.926216] qcom_snand 79b0000.spi: swiotlb buffer is full (sz: 4294967294 bytes), total 512 (slots), used 0 (slots)
> [ 85.932937] qcom_snand 79b0000.spi: failure in mapping desc
> [ 87.999314] qcom_snand 79b0000.spi: failure to write raw page
> [ 87.999352] mtd_nandbiterrs: error: write_oob failed (-110)
>
> Rebooting the board after that causes a panic:
>
> # reboot
> [ 877.178844] Unable to handle kernel read from unreadable memory at virtual address 0000000000000078
> [ 877.178913] Mem abort info:
> [ 877.186767] ESR = 0x0000000096000005
> [ 877.189508] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 877.193312] SET = 0, FnV = 0
> [ 877.198780] EA = 0, S1PTW = 0
> [ 877.201676] FSC = 0x05: level 1 translation fault
> [ 877.204684] Data abort info:
> [ 877.209559] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> [ 877.212669] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 877.217964] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 877.223098] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000446ae000
> [ 877.228470] [0000000000000078] pgd=080000004492d403, p4d=080000004492d403, pud=080000004492d403, pmd=0000000000000000
> [ 877.234944] Internal error: Oops: 0000000096000005 [#1] SMP
> [ 877.245395] Modules linked in: mtd_test pppoe ppp_async nft_fib_inet nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack slhc sfp qrtr_smd nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mdio_i2c crc_ccitt evdev gpio_fan crypto_user algif_skcipher algif_rng algif_hash algif_aead af_alg sha512_generic seqiv geniv drbg hmac arc4 libarc4 gpio_keys usb_storage input_core leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom phy_qcom_qusb2 phy_qcom_qmp_usb phy_qcom_m31 aquantia hwmon crc_itu_t crc32c_generic
> [ 877.297039] CPU: 1 UID: 0 PID: 2060 Comm: reboot Not tainted 6.15.0-rc4+ #0 NONE
> [ 877.319267] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C7 (DT)
> [ 877.326820] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 877.333936] pc : wakeup_flusher_threads+0x98/0x1d0
> [ 877.340792] lr : wakeup_flusher_threads+0x128/0x1d0
> [ 877.345653] sp : ffffffc082b23db0
> [ 877.350426] x29: ffffffc082b23db0 x28: ffffff8003dd4000 x27: 0000000000000000
> [ 877.353902] x26: 0000000000000000 x25: 0000000000000000 x24: ffffffc081c89b08
> [ 877.361021] x23: ffffffc081cafc78 x22: 0000000000000020 x21: 0000000000000002
> [ 877.368138] x20: 0000000000000001 x19: ffffffc08032c000 x18: 0000000000000000
> [ 877.375257] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [ 877.382374] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [ 877.389492] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffffc08032c36c
> [ 877.396610] x8 : 0000000000000001 x7 : ffffffc080327318 x6 : 0000000000000000
> [ 877.403728] x5 : 0000000000000000 x4 : 000000000002000f x3 : 0000000000000000
> [ 877.410847] x2 : 0000000000000001 x1 : ffffff8003dd4000 x0 : 0000000000000040
> [ 877.417965] Call trace:
> [ 877.425078] wakeup_flusher_threads+0x98/0x1d0 (P)
> [ 877.427337] ksys_sync+0x2c/0xa0
> [ 877.432197] __arm64_sys_sync+0x18/0x30
> [ 877.435582] invoke_syscall.constprop.0+0x64/0x110
> [ 877.439143] do_el0_svc+0x48/0xd8
> [ 877.444003] el0_svc+0x3c/0xf8
> [ 877.447388] el0t_64_sync_handler+0x10c/0x138
> [ 877.450341] el0t_64_sync+0x180/0x188
> [ 877.454770] Code: d1008016 eb17001f 540002c0 a90153f3 (f9402ec0)
> [ 877.458416] ---[ end trace 0000000000000000 ]---
> [ 877.464492] Kernel panic - not syncing: Oops: Fatal exception
> [ 877.469180] SMP: stopping secondary CPUs
> [ 877.474825] Kernel Offset: disabled
> [ 877.478812] CPU features: 0x0000,00000068,00000000,0200400b
> [ 877.482025] Memory Limit: none
> [ 877.487581] Rebooting in 3 seconds..
>
> 2. If the swiotlb mapping does not fail, rebooting the board may result
> in a different panic:
>
> # reboot
> [ 256.104459] BUG: spinlock bad magic on CPU#3, procd/2241
> [ 256.104488] Unable to handle kernel paging request at virtual address ffffffff0000049b
> [ 256.108827] Mem abort info:
> [ 256.116548] ESR = 0x0000000096000005
> [ 256.119240] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 256.123060] SET = 0, FnV = 0
> [ 256.128528] EA = 0, S1PTW = 0
> [ 256.131391] FSC = 0x05: level 1 translation fault
> [ 256.134431] Data abort info:
> [ 256.139291] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> [ 256.142419] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 256.147713] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 256.152836] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000041e5b000
> [ 256.158218] [ffffffff0000049b] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
> [ 256.164906] Internal error: Oops: 0000000096000005 [#1] SMP
> [ 256.173323] Modules linked in: mtd_test pppoe ppp_async nft_fib_inet nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack slhc sfp qrtr_smd nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 mdio_i2c crc_ccitt evdev gpio_fan crypto_user algif_skcipher algif_rng algif_hash algif_aead af_alg sha512_generic seqiv geniv drbg hmac arc4 libarc4 gpio_keys usb_storage input_core leds_gpio xhci_plat_hcd xhci_pci xhci_hcd dwc3 dwc3_qcom phy_qcom_qusb2 phy_qcom_qmp_usb phy_qcom_m31 aquantia hwmon crc_itu_t crc32c_generic
> [ 256.225139] CPU: 3 UID: 0 PID: 2241 Comm: procd Not tainted 6.15.0-rc4+ #0 NONE
> [ 256.247369] Hardware name: Qualcomm Technologies, Inc. IPQ9574/AP-AL02-C7 (DT)
> [ 256.254921] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 256.261951] pc : spin_bug+0x70/0xd8
> [ 256.268806] lr : spin_bug+0x68/0xd8
> [ 256.272278] sp : ffffffc083763b00
> [ 256.275750] x29: ffffffc083763b00 x28: ffffff80005b8000 x27: ffffffc0827f4000
> [ 256.279225] x26: ffffffc080dab000 x25: ffffffc083763c18 x24: ffffff80005b8000
> [ 256.286344] x23: 0000000000000002 x22: ffffffc081c00c00 x21: ffffffff0000001b
> [ 256.293462] x20: ffffffc080d44c80 x19: ffffff8004d52888 x18: 0000000000000005
> [ 256.300580] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000006
> [ 256.307698] x14: 0000000000000000 x13: 313432322f64636f x12: ffffffc081c707b8
> [ 256.314816] x11: 0000000000000001 x10: 0000000000000001 x9 : ffffffc0800cdfd0
> [ 256.321935] x8 : c0000000ffffefff x7 : ffffffc081c186a8 x6 : 0000000000057fa8
> [ 256.329052] x5 : ffffffc081c70760 x4 : 0000000000000000 x3 : ffffffc0837638e0
> [ 256.336170] x2 : 000000000000001b x1 : ffffff80005b8000 x0 : 000000000000002c
> [ 256.343289] Call trace:
> [ 256.350402] spin_bug+0x70/0xd8 (P)
> [ 256.352660] do_raw_spin_lock+0xa4/0x128
> [ 256.356132] _raw_spin_lock_irqsave+0x70/0x98
> [ 256.360299] __mutex_lock+0x268/0xdf0
> [ 256.364552] mutex_lock_nested+0x2c/0x40
> [ 256.368199] device_shutdown+0xfc/0x260
> [ 256.372191] kernel_restart+0x4c/0xb8
> [ 256.375750] __do_sys_reboot+0x108/0x230
> [ 256.379570] __arm64_sys_reboot+0x2c/0x40
> [ 256.383563] invoke_syscall.constprop.0+0x64/0x110
> [ 256.387470] do_el0_svc+0x48/0xd8
> [ 256.392156] el0_svc+0x3c/0xf8
> [ 256.395541] el0t_64_sync_handler+0x10c/0x138
> [ 256.398494] el0t_64_sync+0x180/0x188
> [ 256.402923] Code: 54000220 940009a5 b9400662 b4000295 (b94482a4)
> [ 256.406570] ---[ end trace 0000000000000000 ]---
> [ 256.412645] Kernel panic - not syncing: Oops: Fatal exception
> [ 256.417340] Kernel Offset: disabled
> [ 256.422972] CPU features: 0x0000,00000068,00000000,0200400b
> [ 256.426273] Memory Limit: none
> [ 256.431828] Rebooting in 3 seconds..
>
> Investigating the issue revealed that these symptoms are results of
> memory corruption which is caused by out of bounds access within the
> driver.
>
> The driver uses a dynamically allocated structure for BAM transactions,
> which structure must have enough space for all possible variations of
> different flash operations initiated by the driver. The required space
> heavily depends on the actual number of 'codewords' which is calculated
> from the pagesize of the actual NAND chip.
>
> Although the qcom_nandc_alloc() function allocates memory for the BAM
> transactions during probe, but since the actual number of 'codewords'
> is not yet know the allocation is done for one 'codeword' only.
>
> Because of this, whenever the driver does a flash operation, and the
> number of the required transactions exceeds the size of the allocated
> arrays the driver accesses memory out of the allocated range.
>
> To avoid this, change the code to free the initially allocated BAM
> transactions memory, and allocate a new one once the actual number of
> 'codewords' required for a given NAND chip is known.
>
> Fixes: 7304d1909080 ("spi: spi-qpic: add driver for QCOM SPI NAND flash Interface")
> Signed-off-by: Gabor Juhos <j4g8y7@...il.com>
> ---
> drivers/spi/spi-qpic-snand.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/drivers/spi/spi-qpic-snand.c b/drivers/spi/spi-qpic-snand.c
> index fd129650434f0129e24d3bdac7e7c4d5542627e6..c552cb7aa80e368e193d71e1826b2cc005571a9c 100644
> --- a/drivers/spi/spi-qpic-snand.c
> +++ b/drivers/spi/spi-qpic-snand.c
> @@ -315,6 +315,22 @@ static int qcom_spi_ecc_init_ctx_pipelined(struct nand_device *nand)
>
> mtd_set_ooblayout(mtd, &qcom_spi_ooblayout);
>
> + /*
> + * Free the temporary BAM transaction allocated initially by
> + * qcom_nandc_alloc(), and allocate a new one based on the
> + * updated max_cwperpage value.
> + */
> + qcom_free_bam_transaction(snandc);
> +
> + snandc->max_cwperpage = cwperpage;
> +
> + snandc->bam_txn = qcom_alloc_bam_transaction(snandc);
> + if (!snandc->bam_txn) {
> + dev_err(snandc->dev, "failed to allocate BAM transaction\n");
> + ret = -ENOMEM;
> + goto err_free_ecc_cfg;
> + }
> +
> ecc_cfg->cfg0 = FIELD_PREP(CW_PER_PAGE_MASK, (cwperpage - 1)) |
> FIELD_PREP(UD_SIZE_BYTES_MASK, ecc_cfg->cw_data) |
> FIELD_PREP(DISABLE_STATUS_AFTER_WRITE, 1) |
>
Reviewed-by: Md Sadre Alam <quic_mdalam@...cinc.com>
Thanks,
Alam.
Powered by blists - more mailing lists