[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano>
Date: Wed, 15 May 2024 17:05:53 -0400
From: Nícolas F. R. A. Prado <nfraprado@...labora.com>
To: regressions@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, kernel@...labora.com
Subject: Re: [REGRESSION] boot regression on linux-next on sc7180 platforms -
null pointer dereference on iommu_dma_sync_sg_for_device
On Tue, May 14, 2024 at 12:41:29PM -0400, Nícolas F. R. A. Prado wrote:
> Hi,
>
> KernelCI has identified a new boot regression on linux-next. It affects the
> following platforms:
> * sc7180-trogdor-kingoftown
> * sc7180-trogdor-lazor-limozeen
>
> The regression was introduced in next-20240509, and still affects today's
> (next-20240514) release.
>
> The config used was the upstream arm64 defconfig with a config fragment on top
> [1].
>
> The following stack traces are produced during boot and a usable shell is never
> reached:
>
> [ 0.381981] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c
> [ 0.381989] Mem abort info:
> [ 0.381991] ESR = 0x0000000096000004
> [ 0.381994] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 0.381997] SET = 0, FnV = 0
> [ 0.382000] EA = 0, S1PTW = 0
> [ 0.382003] FSC = 0x04: level 0 translation fault
> [ 0.382006] Data abort info:
> [ 0.382008] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 0.382011] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 0.382014] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 0.382017] [000000000000001c] user address but active_mm is swapper
> [ 0.382021] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [ 0.382025] Modules linked in:
> [ 0.382032] CPU: 4 PID: 68 Comm: kworker/u32:2 Not tainted 6.9.0-next-20240514-dirty #380
> [ 0.382038] Hardware name: Google Kingoftown (DT)
> [ 0.382042] Workqueue: async async_run_entry_fn
> [ 0.382055] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 0.382061] pc : iommu_dma_sync_sg_for_device+0x28/0x100
> [ 0.382070] lr : __dma_sync_sg_for_device+0x28/0x4c
> [ 0.382080] sp : ffff800080943740
> [ 0.382082] x29: ffff800080943740 x28: ffff36ee44280000 x27: ffff36ee40bd7810
> [ 0.382092] x26: ffff800080943998 x25: ffff36ee44280480 x24: ffffb54600bcf0e8
> [ 0.382101] x23: ffff36ee40bd7810 x22: 0000000000000001 x21: 0000000000000000
> [ 0.382110] x20: ffffb54600f3d098 x19: 0000000000000000 x18: ffffb54601c1a210
> [ 0.382118] x17: 000000040044ffff x16: 0000000000000000 x15: ffff36efb6d95580
> [ 0.382126] x14: ffff36ee409156c0 x13: 0000000000001797 x12: 0000000000000002
> [ 0.382134] x11: 0000000000000004 x10: ffff36ee4308b3d8 x9 : ffff36ee44280469
> [ 0.382143] x8 : ffff36ee4308b304 x7 : 00000000ffffffff x6 : 0000000000000001
> [ 0.382151] x5 : ffffb5460033a740 x4 : ffffb545ff50375c x3 : 0000000000000001
> [ 0.382159] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff36ee40bd7810
> [ 0.382167] Call trace:
> [ 0.382170] iommu_dma_sync_sg_for_device+0x28/0x100
> [ 0.382176] __dma_sync_sg_for_device+0x28/0x4c
> [ 0.382183] spi_transfer_one_message+0x378/0x6e4
> [ 0.382193] __spi_pump_transfer_message+0x190/0x4a4
> [ 0.382199] __spi_sync+0x2a0/0x3c4
> [ 0.382205] spi_sync_locked+0x10/0x1c
> [ 0.382211] tpm_tis_spi_transfer_full+0x160/0x2fc
> [ 0.382217] tpm_tis_spi_transfer+0x34/0x40
> [ 0.382221] tpm_tis_spi_cr50_read_bytes+0x5c/0x90
> [ 0.382226] tpm_tis_core_init+0xfc/0x7e0
> [ 0.382231] tpm_tis_spi_init+0x54/0x70
> [ 0.382236] cr50_spi_probe+0xf4/0x27c
> [ 0.382241] tpm_tis_spi_driver_probe+0x34/0x64
> [ 0.382245] spi_probe+0x84/0xe4
> [ 0.382251] really_probe+0xbc/0x2a0
> [ 0.382258] __driver_probe_device+0x78/0x12c
> [ 0.382264] driver_probe_device+0x40/0x160
> [ 0.382269] __device_attach_driver+0xb8/0x134
> [ 0.382275] bus_for_each_drv+0x84/0xe0
> [ 0.382280] __device_attach_async_helper+0xac/0xd0
> [ 0.382286] async_run_entry_fn+0x34/0xe0
> [ 0.382291] process_one_work+0x154/0x298
> [ 0.382300] worker_thread+0x304/0x408
> [ 0.382307] kthread+0x118/0x11c
> [ 0.382313] ret_from_fork+0x10/0x20
> [ 0.382324] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c20)
> [ 0.382328] ---[ end trace 0000000000000000 ]---
Tracked down the issue to commit 8cc3bad9d9d6 ("spi: Remove unneded check for
orig_nents").
#regzbot introduced: 8cc3bad9d9d6
The issue happens because in spi_dma_sync_for_device(), the line
dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE);
is passing a scatterlist table (xfer->tx_sg) that hasn't been initialized, so
the sgl pointer inside it is null. Before the patch, the check would prevent it
from being called since orig_nents is 0.
This initialization of the scatterlist table should happen in
spi_map_buf_attrs(), which should be called in __spi_map_msg(), however some
debugging revealed that the previous check
if (!ctlr->can_dma(ctlr, msg->spi, xfer))
is failing, which is why the scatterlist table isn't initialized. Despite that,
ctlr->cur_msg_mapped is set to true, so spi_dma_sync_for_device() doesn't return
early, which would completely avoid the issue. At this point I'm confused why
that flag tracks DMA mapping per message, if the mapping is done per transfer
(and a message can contain multiple transfers). Maybe that's what needs to
change, though I'd like the input from someone who is familiar with this code.
Thanks,
Nícolas
Powered by blists - more mailing lists