linux-kernel - Re: [PATCH 1/6] spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed transfer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d845244-bad0-4e7c-8f7c-bd3224cabac4@gmail.com>
Date: Fri, 16 Jan 2026 11:48:44 +0000
From: Usama Arif <usamaarif642@...il.com>
To: Breno Leitao <leitao@...ian.org>,
 Thierry Reding <thierry.reding@...il.com>,
 Jonathan Hunter <jonathanh@...dia.com>,
 Sowjanya Komatineni <skomatineni@...dia.com>,
 Laxman Dewangan <ldewangan@...dia.com>, Mark Brown <broonie@...nel.org>,
 Vishwaroop A <va@...dia.com>
Cc: Thierry Reding <treding@...dia.com>, linux-tegra@...r.kernel.org,
 linux-spi@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, puranjay@...nel.org
Subject: Re: [PATCH 1/6] spi: tegra210-quad: Return IRQ_HANDLED when timeout
 already processed transfer



On 16/01/2026 10:41, Breno Leitao wrote:
> When the ISR thread wakes up late and finds that the timeout handler
> has already processed the transfer (curr_xfer is NULL), return
> IRQ_HANDLED instead of IRQ_NONE.
> 
> Use a similar approach to tegra_qspi_handle_timeout() by reading
> QSPI_TRANS_STATUS and checking the QSPI_RDY bit to determine if the
> hardware actually completed the transfer. If QSPI_RDY is set, the
> interrupt was legitimate and triggered by real hardware activity.
> The fact that the timeout path handled it first doesn't make it
> spurious. Returning IRQ_NONE incorrectly suggests the interrupt
> wasn't for this device, which can cause issues with shared interrupt
> lines and interrupt accounting.
> 
> Fixes: b4e002d8a7ce ("spi: tegra210-quad: Fix timeout handling")
> Signed-off-by: Breno Leitao <leitao@...ian.org>
> ---
>  drivers/spi/spi-tegra210-quad.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)

Hi,

I am not familiar with tegra SPI specifically, so I might be missing something, but
I was wondering if its actually better to not handle the transfer in timeout at all. 

We have 1000ms before timeout and I feel like that should have been enough.
Looking at other spi drivers: imx, fsl,.. these dont handle transfers in timeout
and are therefore lockless. tegra_qspi_handle_timeout also at the end checks
if for some reason the transfer fails (although it looks like handle_cpu/dma_based_xfer
cant really fail?), and would just return failure.

Removing the attempt to transfer in timeout will get rid of the spinlock, all the bugs,
make isr handling quicker as you dont need to acquire a spinlock (which might lead to lesser
timeouts?) and makes the whole driver much more simpler. Or maybe I am missing something?

I have a potential untested patch below for how it would look like. We can work on testing this
if it makes sense to the maintainers?



commit 09af307fbb33717e8de9489cef2abd743e3f6a93
Author: Usama Arif <usamaarif642@...il.com>
Date:   Wed Jan 14 16:47:06 2026 +0000

    spi: tegra210-quad: remove tqspi->lock and xfer handling in timeout
    
    The tqspi->lock spinlock existed to protect against races between the
    IRQ handler and the timeout recovery path, which could both call
    handle_cpu_based_xfer() or handle_dma_based_xfer(). However, this
    locking is not implemented correctly and is causing a race condition.
    Looking at other spi drivers that have a irq (for e.g. spi-fsl-dspi.c),
    it is simpler to not just handle trasfers in timeout.
    
    Remove xfer handling when there is timeout, eliminating the need for
    the lock completely, simplifying the code and eliminating the race
    condition. The existing code already handles xfer failure by returning
    -EIO. This brings tegra210-quad inline with other drivers like
    spi-fsl-dspi, spi-imx and spi-fs-lpspi. Making the irq handling lockless
    might also make it faster, which could lead to lesser timeouts.
    
    Signed-off-by: Usama Arif <usamaarif642@...il.com>

diff --git a/drivers/spi/spi-tegra210-quad.c b/drivers/spi/spi-tegra210-quad.c
index cdc3cb7c01f9b..ba67e628bd9ef 100644
--- a/drivers/spi/spi-tegra210-quad.c
+++ b/drivers/spi/spi-tegra210-quad.c
@@ -184,8 +184,6 @@ struct tegra_qspi_client_data {
 struct tegra_qspi {
 	struct device				*dev;
 	struct spi_controller			*host;
-	/* lock to protect data accessed by irq */
-	spinlock_t				lock;
 
 	struct clk				*clk;
 	void __iomem				*base;
@@ -968,7 +966,6 @@ static int tegra_qspi_setup(struct spi_device *spi)
 {
 	struct tegra_qspi *tqspi = spi_controller_get_devdata(spi->controller);
 	struct tegra_qspi_client_data *cdata = spi->controller_data;
-	unsigned long flags;
 	u32 val;
 	int ret;
 
@@ -982,8 +979,6 @@ static int tegra_qspi_setup(struct spi_device *spi)
 		cdata = tegra_qspi_parse_cdata_dt(spi);
 		spi->controller_data = cdata;
 	}
-	spin_lock_irqsave(&tqspi->lock, flags);
-
 	/* keep default cs state to inactive */
 	val = tqspi->def_command1_reg;
 	val |= QSPI_CS_SEL(spi_get_chipselect(spi, 0));
@@ -995,8 +990,6 @@ static int tegra_qspi_setup(struct spi_device *spi)
 	tqspi->def_command1_reg = val;
 	tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
 
-	spin_unlock_irqrestore(&tqspi->lock, flags);
-
 	pm_runtime_put(tqspi->dev);
 
 	return 0;
@@ -1048,48 +1041,6 @@ static void tegra_qspi_transfer_end(struct spi_device *spi)
 	tegra_qspi_writel(tqspi, tqspi->def_command1_reg, QSPI_COMMAND1);
 }
 
-static irqreturn_t handle_cpu_based_xfer(struct tegra_qspi *tqspi);
-static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi);
-
-/**
- * tegra_qspi_handle_timeout - Handle transfer timeout with hardware check
- * @tqspi: QSPI controller instance
- *
- * When a timeout occurs but hardware has completed the transfer (interrupt
- * was lost or delayed), manually trigger transfer completion processing.
- * This avoids failing transfers that actually succeeded.
- *
- * Returns: 0 if transfer was completed, -ETIMEDOUT if real timeout
- */
-static int tegra_qspi_handle_timeout(struct tegra_qspi *tqspi)
-{
-	irqreturn_t ret;
-	u32 status;
-
-	/* Check if hardware actually completed the transfer */
-	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
-	if (!(status & QSPI_RDY))
-		return -ETIMEDOUT;
-
-	/*
-	 * Hardware completed but interrupt was lost/delayed. Manually
-	 * process the completion by calling the appropriate handler.
-	 */
-	dev_warn_ratelimited(tqspi->dev,
-			     "QSPI interrupt timeout, but transfer complete\n");
-
-	/* Clear the transfer status */
-	status = tegra_qspi_readl(tqspi, QSPI_TRANS_STATUS);
-	tegra_qspi_writel(tqspi, status, QSPI_TRANS_STATUS);
-
-	/* Manually trigger completion handler */
-	if (!tqspi->is_curr_dma_xfer)
-		ret = handle_cpu_based_xfer(tqspi);
-	else
-		ret = handle_dma_based_xfer(tqspi);
-
-	return (ret == IRQ_HANDLED) ? 0 : -EIO;
-}
 
 static u32 tegra_qspi_cmd_config(bool is_ddr, u8 bus_width, u8 len)
 {
@@ -1220,28 +1171,19 @@ static int tegra_qspi_combined_seq_xfer(struct tegra_qspi *tqspi,
 					QSPI_DMA_TIMEOUT);
 
 			if (WARN_ON_ONCE(ret == 0)) {
-				/*
-				 * Check if hardware completed the transfer
-				 * even though interrupt was lost or delayed.
-				 * If so, process the completion and continue.
-				 */
-				ret = tegra_qspi_handle_timeout(tqspi);
-				if (ret < 0) {
-					/* Real timeout - clean up and fail */
-					dev_err(tqspi->dev, "transfer timeout\n");
-
-					/* Abort transfer by resetting pio/dma bit */
-					if (tqspi->is_curr_dma_xfer)
-						tegra_qspi_dma_stop(tqspi);
-					else
-						tegra_qspi_pio_stop(tqspi);
-
-					/* Reset controller if timeout happens */
-					tegra_qspi_reset(tqspi);
-
-					ret = -EIO;
-					goto exit;
-				}
+				dev_err(tqspi->dev, "transfer timeout\n");
+
+				/* Abort transfer by resetting pio/dma bit */
+				if (tqspi->is_curr_dma_xfer)
+					tegra_qspi_dma_stop(tqspi);
+				else
+					tegra_qspi_pio_stop(tqspi);
+
+				/* Reset controller if timeout happens */
+				tegra_qspi_reset(tqspi);
+
+				ret = -EIO;
+				goto exit;
 			}
 
 			if (tqspi->tx_status ||  tqspi->rx_status) {
@@ -1332,23 +1274,14 @@ static int tegra_qspi_non_combined_seq_xfer(struct tegra_qspi *tqspi,
 		ret = wait_for_completion_timeout(&tqspi->xfer_completion,
 						  QSPI_DMA_TIMEOUT);
 		if (WARN_ON(ret == 0)) {
-			/*
-			 * Check if hardware completed the transfer even though
-			 * interrupt was lost or delayed. If so, process the
-			 * completion and continue.
-			 */
-			ret = tegra_qspi_handle_timeout(tqspi);
-			if (ret < 0) {
-				/* Real timeout - clean up and fail */
-				dev_err(tqspi->dev, "transfer timeout\n");
+			dev_err(tqspi->dev, "transfer timeout\n");
 
-				if (tqspi->is_curr_dma_xfer)
-					tegra_qspi_dma_stop(tqspi);
+			if (tqspi->is_curr_dma_xfer)
+				tegra_qspi_dma_stop(tqspi);
 
-				tegra_qspi_handle_error(tqspi);
-				ret = -EIO;
-				goto complete_xfer;
-			}
+			tegra_qspi_handle_error(tqspi);
+			ret = -EIO;
+			goto complete_xfer;
 		}
 
 		if (tqspi->tx_status ||  tqspi->rx_status) {
@@ -1441,9 +1374,6 @@ static int tegra_qspi_transfer_one_message(struct spi_controller *host,
 static irqreturn_t handle_cpu_based_xfer(struct tegra_qspi *tqspi)
 {
 	struct spi_transfer *t = tqspi->curr_xfer;
-	unsigned long flags;
-
-	spin_lock_irqsave(&tqspi->lock, flags);
 
 	if (tqspi->tx_status ||  tqspi->rx_status) {
 		tegra_qspi_handle_error(tqspi);
@@ -1468,7 +1398,6 @@ static irqreturn_t handle_cpu_based_xfer(struct tegra_qspi *tqspi)
 	tegra_qspi_start_cpu_based_transfer(tqspi, t);
 exit:
 	tqspi->curr_xfer = NULL;
-	spin_unlock_irqrestore(&tqspi->lock, flags);
 	return IRQ_HANDLED;
 }
 
@@ -1476,7 +1405,6 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi)
 {
 	struct spi_transfer *t = tqspi->curr_xfer;
 	unsigned int total_fifo_words;
-	unsigned long flags;
 	long wait_status;
 	int num_errors = 0;
 
@@ -1512,13 +1440,11 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi)
 		}
 	}
 
-	spin_lock_irqsave(&tqspi->lock, flags);
-
 	if (num_errors) {
 		tegra_qspi_dma_unmap_xfer(tqspi, t);
 		tegra_qspi_handle_error(tqspi);
 		complete(&tqspi->xfer_completion);
-		goto exit;
+		return IRQ_HANDLED;
 	}
 
 	if (tqspi->cur_direction & DATA_DIR_RX)
@@ -1532,7 +1458,7 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi)
 	if (tqspi->cur_pos == t->len) {
 		tegra_qspi_dma_unmap_xfer(tqspi, t);
 		complete(&tqspi->xfer_completion);
-		goto exit;
+		return IRQ_HANDLED;
 	}
 
 	tegra_qspi_dma_unmap_xfer(tqspi, t);
@@ -1544,8 +1470,6 @@ static irqreturn_t handle_dma_based_xfer(struct tegra_qspi *tqspi)
 	else
 		num_errors = tegra_qspi_start_cpu_based_transfer(tqspi, t);
 
-exit:
-	spin_unlock_irqrestore(&tqspi->lock, flags);
 	return IRQ_HANDLED;
 }
 
@@ -1679,7 +1603,6 @@ static int tegra_qspi_probe(struct platform_device *pdev)
 
 	tqspi->host = host;
 	tqspi->dev = &pdev->dev;
-	spin_lock_init(&tqspi->lock);
 
 	tqspi->soc_data = device_get_match_data(&pdev->dev);
 	host->num_chipselect = tqspi->soc_data->cs_count;