linux-kernel - Re: Regression: spi: core: avoid waking pump thread from spi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <aabd916e-005e-6cda-25d7-8ab875afa7a0@nvidia.com>
Date:   Tue, 15 Jan 2019 14:26:02 +0000
From:   Jon Hunter <jonathanh@...dia.com>
To:     Martin Sperl <kernel@...tin.sperl.org>
CC:     Mark Brown <broonie@...nel.org>,
        linux-tegra <linux-tegra@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        <linux-spi@...r.kernel.org>
Subject: Re: Regression: spi: core: avoid waking pump thread from spi_sync
 instead run teardown delayed

Hi Martin,

On 14/01/2019 22:01, Martin Sperl wrote:
> Hi Jon,
> 
> On 14.01.2019, at 16:35, Jon Hunter <jonathanh@...dia.com
> <mailto:jonathanh@...dia.com>> wrote:
> 
>> Hi Martin, Mark,
>>
>> [   58.222033] spi_master spi1: could not stop message queue
>> [   58.222038] spi_master spi1: queue stop failed
>> [   58.222048] dpm_run_callback(): platform_pm_suspend+0x0/0x54
>> returns -16
>> [   58.222052] PM: Device 7000da00.spi failed to suspend: error -16
>> [   58.222057] PM: Some devices failed to suspend, or early wake event
>> detected
> 
> Unfortunately I have not been able to reproduce this in 
> my test cases with the hw available to me.

Looking at both boards that fail, tegra30-cardhu-a04 and
tegra124-jetson-tk1 they both have a spi-flash. The compatible strings
for the spi flashes are "winbond,w25q32" and "winbond,w25q32dw",
respectively which interestingly are not documented/used anywhere in the
kernel. It appears that there was a patch to fix this a few years back
but never got applied [0]. However, applying this patch does not fix the
issue. Furthermore, without this patch applied I see that the spi flash
is detected fine ...

[    2.540395] m25p80 spi1.0: w25q32dw (4096 Kbytes)

So this is not related but the main point is occurs with a spi flash device.

> Looks as if there is something missing in spi_stop_queue that 
> would wake the worker thread one last time without any delays
> and finish the hw shutdown immediately - it runs as a delayed
> task...
> 
> One question: do you run any spi transfers in
> your test case before suspend?

No and before suspending I dumped some of the spi stats and I see no
tranfers/messages at all ...

Stats for spi1 ...
Bytes: 0
Errors: 0
Messages: 0
Transfers: 0

> /sys/class/spi_master/spi1/statistics/messages gives some
> counters on the number of spi messages processed which
> would give you an indication if that is happening.
> 
> It could be as easy as adding right after the first lock 
> in spi_stop_queue:
> kthread_mod_delayed_work(&ctlr->kworker,
>  &ctlr->pump_idle_teardown, 0);
> (plus maybe a yield or similar to allow the worker to 
> quickly/reliably run on a single core machine)
> 
> I hope that this initial guess helps.

Unfortunately, the above did not help and the issue persists.

Digging a bit deeper I see that now the 'ctlr->queue' is empty but
'ctlr->busy' flag is set and this is causing the 'could not stop message
queue' error.

It seems that __spi_pump_messages() is getting called several times
during boot when registering the spi-flash, then after the spi-flash has
been registered, about a 1 sec later spi_pump_idle_teardown() is called
(as expected), but exits because 'ctlr->running' is true. However,
spi_pump_idle_teardown() is never called again and when we suspend we
are stuck in the busy/running state. In this case should something be
scheduling spi_pump_idle_teardown() again? Although even if it does I
don't see where the busy flag would be cleared in this path?

Cheers
Jon

[0] https://patchwork.kernel.org/patch/7021961/
-- 
nvpublic