lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <1904336c-3349-b22b-18ac-e82e4afebc51@suse.de>
Date: Wed, 2 Jan 2019 01:44:40 +0100
From: Andreas Färber <afaerber@...e.de>
To: Mark Brown <broonie@...nel.org>,
"linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>
Cc: Ben Whitten <ben.whitten@...il.com>,
devicetree <devicetree@...r.kernel.org>,
linux-clk <linux-clk@...r.kernel.org>,
Maxime Ripard <maxime.ripard@...tlin.com>,
netdev@...r.kernel.org,
Michael Turquette <mturquette@...libre.com>,
Stephen Boyd <sboyd@...eaurora.org>,
"linux-lpwan@...ts.infradead.org" <linux-lpwan@...ts.infradead.org>,
linux-kernel@...r.kernel.org, Russell King <linux@...linux.org.uk>,
starnight@...cu.edu.tw, "David S. Miller" <davem@...emloft.net>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 lora-next 5/5] net: lora: sx125x sx1301: allow radio to
register as a clk provider
Am 31.12.18 um 23:56 schrieb Andreas Färber:
> Am 31.12.18 um 18:50 schrieb Mark Brown:
>> On Sun, Dec 30, 2018 at 11:55:46AM +0100, Andreas Färber wrote:
>>> Given that observed symptoms were CPU stalls, workqueue hangs and RCU
>>> problems, requiring a power-cycle to recover, I wonder whether we are
>>> running into some atomic/locking issue with clk_enable()? Is it valid at
>>> all to use SPI/regmap for clk_enable()? If it is, is there a known issue
>>> specific to spi-sun6i (A64) in 4.20.0?
>>> I already tried setting .disable_locking = true in both regmap_configs.
>>> Any suggestions how to further debug?
>>
>> You can't use SPI for clk_enable(), clk_enable() needs to be doable in
>> atomic context since we need to wait for the bus operations to complete
>> (you can start SPI transfers in atomic context but you still need to
>> wait for them to complete). Any clocks that are only accessible via a
>> slow bus like I2C or SPI need to do the enable/disable in the
>> prepare/unprepare operations which aren't done in atomic context.
>>
>> regmap can be used in atomic contexts, though you need to configure it
>> to use spinlocks instead of mutexes and ensure that no register cache
>> allocations happen during I/O (eg, by providing defaults for all
>> registers or by not using a cache).
>
> We have .cache_type = REGCACHE_NONE on both bus and spi regmap_configs.
>
> I moved the regmap_field_write() from .enable to .prepare and set
> .fast_io = true on both regmap_configs to force using spinlocks, but
> same hang as in .enable before...
>
> And same if I set .disable_locking = true on both.
>
> Given that it works with one SPI driver and not with the other,
> independent of the locking options applied, I assume my symptoms are not
> a regmap-layer issue.
>
> Is it allowed during a .prepare operation to call the mentioned
> clk_get_rate(), which ends up calling clk_prepare_lock()?
>
> According to my debug output in spi-sun6i.c our hanging
> regmap_field_write() ends up calling sun6i_transfer_one() three times,
> the first two look okay, but the third one doesn't make it past the
> clk_get_rate() [...].
SysRq still works in that state! Attached is SysRq-w output.
(still with .disable_locking = true in both regmap_configs)
In the very bottom you see the "ip" task, at wait_for_completion() from
__spi_sync().
I trigger this issue with `ip link set lora2 up`, so that looks okay.
Then there's a "spi1" task at clk_prepare_lock()' mutex_lock() coming
from spi_pump_messages().
The reason for that will be that clk_prepare_lock()'s mutex_trylock()
failed (because we're holding the prepare_lock from clk_prepare_enable()
in the "ip" task) and that the prepare_owner == current check fails for
this separate task_struct, too.
So, the third invocation of sun6i_transfer_one() calling clk_get_rate()
hangs at the prepare_lock instead of reference-counting, because it runs
from a separate kthread, unlike the two previous calls?
Besides, there's also an mmc_rescan workqueue task at clk_prepare_lock()
coming from sunxi_mmc_enable() due to pm_generic_runtime_resume().
My rootfs is on microSD card.
I did not find any *regmap_init_spi() based example in drivers/clk/, and
all other "spi" mentions in drivers/clk/ appeared to be clock names.
The closest was devm_regmap_init_i2c() based clk-cdce706.c, which uses
the prepare/unprepare ops, as suggested by Mark, and does
regmap_update_bits() from there.
A quick grep in drivers/i2c/ does not find any mention of "kthread", so
probably that's the breaking difference?
Regards,
Andreas
--
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
View attachment "pinie-sysrq-w.txt" of type "text/plain" (12646 bytes)
Powered by blists - more mailing lists