linux-kernel - Re: [PATCH 1/3] spi: tegra210-quad: use device_reset_optional() instead of device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250318-boisterous-adorable-chowchow-cea03b@leitao>
Date: Tue, 18 Mar 2025 11:29:26 -0700
From: Breno Leitao <leitao@...ian.org>
To: Mark Brown <broonie@...ian.org>
Cc: Thierry Reding <thierry.reding@...il.com>,
	Jonathan Hunter <jonathanh@...dia.com>,
	Sowjanya Komatineni <skomatineni@...dia.com>,
	Laxman Dewangan <ldewangan@...dia.com>, linux-tegra@...r.kernel.org,
	linux-spi@...r.kernel.org, linux-kernel@...r.kernel.org,
	rmikey@...a.com, kernel-team@...a.com
Subject: Re: [PATCH 1/3] spi: tegra210-quad: use device_reset_optional()
 instead of device_reset()

On Tue, Mar 18, 2025 at 05:34:55PM +0000, Mark Brown wrote:
> On Tue, Mar 18, 2025 at 10:02:47AM -0700, Breno Leitao wrote:
> 
> > Makes sense. Another question, for platforms like this one that doesn't
> > have the device reset methods, what can we do to stop the bleed?
> 
> > Basically every message that is sent to the SPI controller will fail,
> > which will trigger the device_reet() which is a no-op, but the device
> > will continue to be online. Should we disable the device after some
> > point?
> 
> The SPI controller is only going to be doing something because some
> driver for an attached SPI device is trying to do something.  Presumably
> whatever driver that is won't be having a good time and can hopefully
> figure something out, though given that SPI is simple and not
> hotpluggable this isn't really something that comes up a lot in
> production so I'd be unsurprised to see things just keep on retrying.
> I'd expect to see any substantial error handling in the driver for the
> device rather than in the controller.

Good point. In my specific case, this is coming from tpm_tis,
which is not aware that the device is totally dead, and continues to ask
for random numbers:

            tegra_qspi_transfer_one_message
            __spi_pump_transfer_message
            __spi_sync
            spi_sync
            tpm_tis_spi_transfer
            tpm_tis_spi_read_bytes
            tpm_tis_request_locality
            tpm_chip_start
            tpm_try_get_ops
            tpm_find_get_ops
            tpm_get_random
            tpm_hwrng_read
            hwrng_fillfn
            kthread
            ret_from_fork

Looking at tpm_tis, it seems it doesn't care if the the SPI is dead, and
just forward through the requests, which never complete. Adding Arnd to
see if he has any idea about this.

Arnd,

Summary of the proiblem: tpm_tis is trying to read random numbers
through a dead SPI controller. That causes infinite amounts of warnings
on the kernel, given that the controller is WARNing on time outs (which
is being fixed in one of the patches in this patchset).

Question: Should tpm_tis be aware that the underneath SPI controller is
dead, and eventually get unplugged?

> Obviously there's something wrong with the device description here which
> is upsetting the controller driver.
> 
> > Regarding this patchset, I understand that patch #1 is not ideal as
> > discussed above, what about patch 2 and 3?
> 
> If I didn't say anything they're probably fine.

Do you want me to resend those two separately, or, is this thread
enough?

Thanks again,
--breno