lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1904336c-3349-b22b-18ac-e82e4afebc51@suse.de>
Date:   Wed, 2 Jan 2019 01:44:40 +0100
From:   Andreas Färber <afaerber@...e.de>
To:     Mark Brown <broonie@...nel.org>,
        "linux-spi@...r.kernel.org" <linux-spi@...r.kernel.org>
Cc:     Ben Whitten <ben.whitten@...il.com>,
        devicetree <devicetree@...r.kernel.org>,
        linux-clk <linux-clk@...r.kernel.org>,
        Maxime Ripard <maxime.ripard@...tlin.com>,
        netdev@...r.kernel.org,
        Michael Turquette <mturquette@...libre.com>,
        Stephen Boyd <sboyd@...eaurora.org>,
        "linux-lpwan@...ts.infradead.org" <linux-lpwan@...ts.infradead.org>,
        linux-kernel@...r.kernel.org, Russell King <linux@...linux.org.uk>,
        starnight@...cu.edu.tw, "David S. Miller" <davem@...emloft.net>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v3 lora-next 5/5] net: lora: sx125x sx1301: allow radio to
 register as a clk provider

Am 31.12.18 um 23:56 schrieb Andreas Färber:
> Am 31.12.18 um 18:50 schrieb Mark Brown:
>> On Sun, Dec 30, 2018 at 11:55:46AM +0100, Andreas Färber wrote:
>>> Given that observed symptoms were CPU stalls, workqueue hangs and RCU
>>> problems, requiring a power-cycle to recover, I wonder whether we are
>>> running into some atomic/locking issue with clk_enable()? Is it valid at
>>> all to use SPI/regmap for clk_enable()? If it is, is there a known issue
>>> specific to spi-sun6i (A64) in 4.20.0?
>>> I already tried setting .disable_locking = true in both regmap_configs.
>>> Any suggestions how to further debug?
>>
>> You can't use SPI for clk_enable(), clk_enable() needs to be doable in
>> atomic context since we need to wait for the bus operations to complete
>> (you can start SPI transfers in atomic context but you still need to
>> wait for them to complete).  Any clocks that are only accessible via a
>> slow bus like I2C or SPI need to do the enable/disable in the
>> prepare/unprepare operations which aren't done in atomic context.
>>
>> regmap can be used in atomic contexts, though you need to configure it
>> to use spinlocks instead of mutexes and ensure that no register cache
>> allocations happen during I/O (eg, by providing defaults for all
>> registers or by not using a cache).
> 
> We have .cache_type = REGCACHE_NONE on both bus and spi regmap_configs.
> 
> I moved the regmap_field_write() from .enable to .prepare and set
> .fast_io = true on both regmap_configs to force using spinlocks, but
> same hang as in .enable before...
> 
> And same if I set .disable_locking = true on both.
> 
> Given that it works with one SPI driver and not with the other,
> independent of the locking options applied, I assume my symptoms are not
> a regmap-layer issue.
> 
> Is it allowed during a .prepare operation to call the mentioned
> clk_get_rate(), which ends up calling clk_prepare_lock()?
> 
> According to my debug output in spi-sun6i.c our hanging
> regmap_field_write() ends up calling sun6i_transfer_one() three times,
> the first two look okay, but the third one doesn't make it past the
> clk_get_rate() [...].

SysRq still works in that state! Attached is SysRq-w output.
(still with .disable_locking = true in both regmap_configs)

In the very bottom you see the "ip" task, at wait_for_completion() from
__spi_sync().
I trigger this issue with `ip link set lora2 up`, so that looks okay.

Then there's a "spi1" task at clk_prepare_lock()' mutex_lock() coming
from spi_pump_messages().
The reason for that will be that clk_prepare_lock()'s mutex_trylock()
failed (because we're holding the prepare_lock from clk_prepare_enable()
in the "ip" task) and that the prepare_owner == current check fails for
this separate task_struct, too.

So, the third invocation of sun6i_transfer_one() calling clk_get_rate()
hangs at the prepare_lock instead of reference-counting, because it runs
from a separate kthread, unlike the two previous calls?

Besides, there's also an mmc_rescan workqueue task at clk_prepare_lock()
coming from sunxi_mmc_enable() due to pm_generic_runtime_resume().
My rootfs is on microSD card.

I did not find any *regmap_init_spi() based example in drivers/clk/, and
all other "spi" mentions in drivers/clk/ appeared to be clock names.
The closest was devm_regmap_init_i2c() based clk-cdce706.c, which uses
the prepare/unprepare ops, as suggested by Mark, and does
regmap_update_bits() from there.

A quick grep in drivers/i2c/ does not find any mention of "kthread", so
probably that's the breaking difference?

Regards,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

View attachment "pinie-sysrq-w.txt" of type "text/plain" (12646 bytes)

Powered by blists - more mailing lists