lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 28 Jun 2024 10:27:28 -0500
From: David Lechner <dlechner@...libre.com>
To: Marc Kleine-Budde <mkl@...gutronix.de>,
 Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Mark Brown <broonie@...nel.org>, Martin Sperl <kernel@...tin.sperl.org>,
 David Jander <david@...tonic.nl>, Jonathan Cameron <jic23@...nel.org>,
 Michael Hennerich <michael.hennerich@...log.com>,
 Nuno Sá <nuno.sa@...log.com>,
 Alain Volmat <alain.volmat@...s.st.com>,
 Maxime Coquelin <mcoquelin.stm32@...il.com>,
 Alexandre Torgue <alexandre.torgue@...s.st.com>, linux-spi@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-stm32@...md-mailman.stormreply.com,
 linux-arm-kernel@...ts.infradead.org, linux-iio@...r.kernel.org,
 Julien Stephan <jstephan@...libre.com>,
 Jonathan Cameron <Jonathan.Cameron@...wei.com>, kernel@...gutronix.de,
 T.Scherer@...elmann.de
Subject: Re: [PATCH v2 0/5] spi: add support for pre-cooking messages

On 6/28/24 5:16 AM, Marc Kleine-Budde wrote:
> On 28.06.2024 11:49:38, Oleksij Rempel wrote:
>> It seems to be spi_mux specific. We have seen similar trace on other system
>> with spi_mux.
> 
> Here is the other backtrace from another imx8mp system with a completely
> different workload. Both have in common that they use a spi_mux on the
> spi-imx driver.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000dd0
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000046760000
> [0000000000000dd0] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in: can_raw can ti_ads7950 industrialio_triggered_buffer kfifo_buf spi_mux fsl_imx8_ddr_perf at24 flexcan caam can_dev error rtc_snvs imx8mm_thermal spi_imx capture_events_irq cfg80211 iio_trig_hrtimer industrialio_sw_trigger ind>
> CPU: 3 PID: 177 Comm: spi5 Not tainted 6.9.0 #1
> Hardware name: xxxxxxxxxxxxxxxx (xxxxxxxxx) (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : spi_res_release+0x24/0xb8
> lr : spi_async+0xac/0x118
> sp : ffff8000823fbcc0
> x29: ffff8000823fbcc0 x28: 0000000000000000 x27: 0000000000000000
> x26: ffff8000807bef88 x25: ffff80008115c008 x24: 0000000000000000
> x23: ffff8000826c3938 x22: 0000000000000000 x21: ffff0000076a9800
> x20: 0000000000000000 x19: 0000000000000dc8 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff88c0e760
> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> x11: ffff8000815a1f98 x10: ffff8000823fbb40 x9 : ffff8000807b8420
> x8 : ffff800081508000 x7 : 0000000000000004 x6 : 0000000003ce4c66
> x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000001000000
> x2 : 0000000000000000 x1 : ffff8000826c38e0 x0 : ffff0000076a9800
> Call trace:
>  spi_res_release+0x24/0xb8
>  spi_async+0xac/0x118
>  spi_mux_transfer_one_message+0xb8/0xf0 [spi_mux]
>  __spi_pump_transfer_message+0x260/0x5d8
>  __spi_pump_messages+0xdc/0x320
>  spi_pump_messages+0x20/0x38
>  kthread_worker_fn+0xdc/0x220
>  kthread+0x118/0x128
>  ret_from_fork+0x10/0x20
> Code: a90153f3 a90363f7 91016037 f9403033 (f9400674) 
> ---[ end trace 0000000000000000 ]---
> 
> regards,
> Marc
> 


Hi Oleksij and Marc,

I'm supposed to be on vacation so I didn't look into this deeply yet
but I can see what is happening here.

spi_mux_transfer_one_message() is calling spi_async() which is calling
__spi_optimize_message() on an already optimized message.

Then it also calls __spi_unoptimize_message() which tries to release
resources. But this fails because the spi-mux driver has swapped
out the pointer to the device in the SPI message. This causes the
wrong ctlr to be passed to spi_res_release(), causing the crash.

I don't know if a proper fix could be quite so simple, but here is
something you could try (untested):

---

diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c
index 5d72e3d59df8..ec837e28183d 100644
--- a/drivers/spi/spi-mux.c
+++ b/drivers/spi/spi-mux.c
@@ -42,6 +42,7 @@ struct spi_mux_priv {
 	void			(*child_msg_complete)(void *context);
 	void			*child_msg_context;
 	struct spi_device	*child_msg_dev;
+	bool			child_msg_pre_optimized;
 	struct mux_control	*mux;
 };
 
@@ -94,6 +95,7 @@ static void spi_mux_complete_cb(void *context)
 	m->complete = priv->child_msg_complete;
 	m->context = priv->child_msg_context;
 	m->spi = priv->child_msg_dev;
+	m->pre_optimized = priv->child_msg_pre_optimized;
 	spi_finalize_current_message(ctlr);
 	mux_control_deselect(priv->mux);
 }
@@ -116,10 +118,12 @@ static int spi_mux_transfer_one_message(struct spi_controller *ctlr,
 	priv->child_msg_complete = m->complete;
 	priv->child_msg_context = m->context;
 	priv->child_msg_dev = m->spi;
+	priv->child_msg_pre_optimized = m->pre_optimized;
 
 	m->complete = spi_mux_complete_cb;
 	m->context = priv;
 	m->spi = priv->spi;
+	m->pre_optimized = true;
 
 	/* do the transfer */
 	return spi_async(priv->spi, m);


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ