linux-kernel - Re: [PATCH 0/6] spi: tegra-qspi: Fix race condition causing NULL pointer dereference and spurious IRQ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20260121000000.0000000-1-va@nvidia.com>
Date: Wed, 21 Jan 2026 17:56:17 +0000
From: Vishwaroop A <va@...dia.com>
To: <leitao@...ian.org>
CC: <thierry.reding@...il.com>, <treding@...dia.com>, <jonathanh@...dia.com>,
	<skomatineni@...dia.com>, <ldewangan@...dia.com>, <broonie@...nel.org>,
	<linux-spi@...r.kernel.org>, <linux-tegra@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <kernel-team@...a.com>,
	<puranjay@...nel.org>, <usamaarif642@...il.com>, Vishwaroop A <va@...dia.com>
Subject: Re: [PATCH 0/6] spi: tegra-qspi: Fix race condition causing NULL pointer dereference and spurious IRQ

Hi Breno,

After reviewing Mark Brown's feedback and the code carefully, let me clarify the 
correct logic. This is important to get right.

**IRQ Handler Semantics (per Mark Brown):**
- IRQ_NONE = interrupt was NOT from this device
- IRQ_HANDLED = interrupt WAS from this device (regardless of whether we fully processed it)

**The QSPI_RDY Bit:**
This bit in QSPI_TRANS_STATUS is set by hardware when a transfer completes and triggers 
the interrupt. Software clears it by writing 1.

**Why Your Original v1 Logic is Correct:**

Your "[PATCH 1/6] spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed 
transfer" reads QSPI_TRANS_STATUS at the start of tegra_qspi_isr_thread():

    if (!tqspi->curr_xfer) {
        if (!(status & QSPI_RDY))
            return IRQ_NONE;        // HW never set RDY → spurious interrupt
        return IRQ_HANDLED;         // HW did set RDY → real interrupt, timeout processed it
    }

**Scenario 1 - Delayed ISR (the race you're fixing):**
1. HW completes transfer, sets QSPI_RDY, interrupt fires
2. ISR thread delayed (CPU busy)
3. Timeout handler runs, processes transfer, clears curr_xfer
4. Delayed ISR finally wakes up
5. Reads QSPI_RDY (may still be set)
6. curr_xfer is NULL
7. Return IRQ_HANDLED → this WAS our interrupt, just processed by timeout

**Scenario 2 - Truly Spurious:**
1. Spurious interrupt fires
2. QSPI_RDY = 0 (no transfer completed)
3. curr_xfer is NULL
4. Return IRQ_NONE → not our interrupt

**Your Latest Proposal Has It Backwards:**

The version in your last email returns IRQ_HANDLED when QSPI_RDY is NOT set, which is 
incorrect per Mark's feedback.

**For v2:**

Patches 1-5: Keep as-is from v1 (all correct)

Patch 6 ("[PATCH 6/6] spi: tegra210-quad: Protect curr_xfer check in IRQ handler"): 
The TODO comment you added asks about keeping the lock held across the handler call. I'd 
suggest removing the TODO and replacing it with a comment explaining why the current 
approach is safe:

    spin_unlock_irqrestore(&tqspi->lock, flags);

    /*
     * Lock is released here but handlers safely re-check curr_xfer under lock
     * before dereferencing. DMA handler also needs to sleep in 
     * wait_for_completion_*(), which cannot be done while holding spinlock.
     */
    if (!tqspi->is_curr_dma_xfer)
        return handle_cpu_based_xfer(tqspi);

This documents the design decision and closes the TODO.

The device_reset_optional() was from your March 2025 series - keep that separate.

**Testing:**

Carol Soto will validate v2 with your test methodology and provide feedback.

**Follow-on:**

I'll implement hard IRQ handler support separately after your fix merges.

Best,
Vishwaroop