lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9cb322ba-4c08-474b-bdc2-d21cc1904ecf@collabora.com>
Date: Fri, 16 Aug 2024 23:11:27 +0300
From: Cristian Ciocaltea <cristian.ciocaltea@...labora.com>
To: Mark Brown <broonie@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 "Rafael J. Wysocki" <rafael@...nel.org>, Heiko Stübner
 <heiko@...ech.de>, Andy Yan <andy.yan@...k-chips.com>, kernel@...labora.com,
 linux-kernel@...r.kernel.org, linux-rockchip@...ts.infradead.org
Subject: Re: [PATCH RFC] regmap: maple: Switch to use irq-safe locking

On 8/14/24 10:04 PM, Mark Brown wrote:
> On Wed, Aug 14, 2024 at 01:20:21AM +0300, Cristian Ciocaltea wrote:

[...]

> I'd have a bigger question here which is why the driver is using a
> dynamically allocated register cache in a hardirq context, especially
> with no defaults provided?  Anything except the flat cache might do
> allocations at runtime which might include in interrupt context unless
> the caller is very careful and since the lockdep warning triggered it's
> clear that this driver isn't.  The core will be doing atomic allocations
> for MMIO but that's not something we want to be doing as a matter of
> course...  I would generally expect drivers to try to ensure that any
> registers are cached outside of the interrupt handler, usually by
> specifying defaults or touching all registers during setup.
> 
> Without having done a full analysis it also looks like the marking of
> volatile registers isn't right, it's not immediately clear that the
> interrupt status and clear registers are volatile and they ought to be.
> None of the registers accessed in interrupt context look like they
> should be cached at all unless there's something triggered via the DRM
> vblank calls.

AFAIKT, all registers accessed in IRQ context are volatile, hence the
register cache should not be involved at that point.

The deadlock scenario indicated by lockdep actually points to the lock
acquired by regcache_maple_exit(), which has been triggered during module
unload operation, and the lock acquired by regcache_maple_write(), in the
context of vop2_plane_atomic_update() called within the DRM stack.

[   48.466666] -> (&mt->ma_lock){+...}-{2:2} {
[   48.467066]    HARDIRQ-ON-W at:
[   48.467360]                     lock_acquire+0x1d4/0x320
[   48.467849]                     _raw_spin_lock+0x50/0x70
[   48.468337]                     regcache_maple_exit+0x6c/0xe0
[   48.468864]                     regcache_exit+0x8c/0xa8
[   48.469344]                     regmap_exit+0x24/0x160
[   48.469815]                     devm_regmap_release+0x1c/0x28
[   48.470339]                     release_nodes+0x68/0xa8
[   48.470818]                     devres_release_group+0x120/0x180
[   48.471364]                     component_unbind+0x54/0x70
[   48.471867]                     component_unbind_all+0xb0/0xe8
[   48.472400]                     rockchip_drm_unbind+0x44/0x80 [rockchipdrm]
[   48.473059]                     component_del+0xc8/0x158
[   48.473545]                     dw_hdmi_rockchip_remove+0x28/0x40 [rockchipdrm]

[...]

[   48.482058]    INITIAL USE at:
[   48.482344]                    lock_acquire+0x1d4/0x320
[   48.482824]                    _raw_spin_lock+0x50/0x70
[   48.483304]                    regcache_maple_write+0x27c/0x330
[   48.483844]                    regcache_write+0x6c/0x88
[   48.484323]                    _regmap_read+0x198/0x1c8
[   48.484801]                    _regmap_update_bits+0xc0/0x148
[   48.485327]                    regmap_field_update_bits_base+0x74/0xb0
[   48.485919]                    vop2_plane_atomic_update+0x9e8/0x1490 [rockchipdrm]
[   48.486631]                    drm_atomic_helper_commit_planes+0x190/0x2f8 [drm_kms_helper]

I experimented with a reduced scope of this patch by limiting the use of
the irq-safe lock to regcache_maple_exit() only, and I can confirm this 
was enough to make lockdep happy.

> It might be safer to fall back to the rbtree cache for this device since
> rbtree doesn't force an extra level of locking on us, though like I say
> I'm not convinced that what the driver is doing with caching is a super
> good idea.  Though probably what the driver is doing should work.

I actually gave the flat cache a try on a Rock 3A board and didn't
encounter any (obvious) issues, but my testing capabilities are rather
limited at the moment.

@Andy: Could you, please, shed some light on the topic? i.e. the rational
behind going for an rbtree cache over a flat one, since the latter would be
better suited for MMIO devices.
 
> My first thought here is that if we've got a regmap using spinlocks for
> the regmap lock and a maple tree cache we should arrange things so that
> the maple tree lock is used for the regmap's lock.  That would however
> involve some unpleasant abstraction violation, and possibly some macro
> fun since we'd need to elide the locking from the cache itself when
> using the same lock at the regmap level.  I think that's going to be a
> case of choosing the least unpleasant option.

Thanks, Mark, for the detailed feedback on this!

Regards,
Cristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ