lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251016132923.3577429-1-va@nvidia.com>
Date: Thu, 16 Oct 2025 13:29:21 +0000
From: Vishwaroop A <va@...dia.com>
To: Mark Brown <broonie@...nel.org>, Thierry Reding
	<thierry.reding@...il.com>, Jonathan Hunter <jonathanh@...dia.com>, "Sowjanya
 Komatineni" <skomatineni@...dia.com>, Laxman Dewangan <ldewangan@...dia.com>,
	<smangipudi@...dia.com>, <kyarlagadda@...dia.com>
CC: Vishwaroop A <va@...dia.com>, <linux-spi@...r.kernel.org>,
	<linux-tegra@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: [PATCH v2 0/2] spi: tegra210-quad: Improve timeout handling under high system load

Hi,

This patch series addresses timeout handling issues in the Tegra QSPI driver
that occur under high system load conditions. We've observed that when CPUs
are saturated (due to error injection, RAS firmware activity, or general CPU
contention), QSPI interrupt handlers can be delayed, causing spurious transfer
failures even though the hardware completed the operation successfully.

Patch 1 fixes a stale pointer issue by ensuring curr_xfer is cleared on timeout
and checked when the IRQ thread finally runs. It also ensures interrupts are
properly cleared on failure paths.

Patch 2 adds hardware status checking on timeout. Before failing a transfer,
the driver now reads QSPI_TRANS_STATUS to verify if the hardware actually
completed the operation. If so, it manually invokes the completion handler
instead of failing the transfer. This distinguishes genuine hardware timeouts
from delayed/lost interrupts.

These changes have been tested in production environments under various high
load scenarios including RAS testing and CPU saturation workloads.

Changes in v2:
- Fixed indentation in patch 1/2: The "Reset controller if timeout happens"
  block now has correct indentation (inside the WARN_ON_ONCE block)
- No functional changes

Testing:
- Verified normal operation under light load
- Tested under heavy CPU load with concurrent workloads
- Validated with RAS firmware activity and error injection
- Confirmed no regressions in existing timeout behavior

Thierry Reding (1):
  spi: tegra210-quad: Fix timeout handling

Vishwaroop A (1):
  spi: tegra210-quad: Check hardware status on timeout

 drivers/spi/spi-tegra210-quad.c | 195 ++++++++++++++++++++++++++------
 1 file changed, 138 insertions(+), 57 deletions(-)

-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ