[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff2bc13c-f66f-03f3-fc01-c4f962f7b694@metafoo.de>
Date: Sat, 20 Aug 2022 13:08:28 +0200
From: Lars-Peter Clausen <lars@...afoo.de>
To: Jonathan Cameron <jic23@...nel.org>,
Vincent Whitchurch <vincent.whitchurch@...s.com>
Cc: kernel@...s.com, linux-iio@...r.kernel.org,
linux-kernel@...r.kernel.org, Peter Rosin <peda@...ntia.se>
Subject: Re: [PATCH] iio: buffer: Silence lock nesting splat
On 8/20/22 13:06, Jonathan Cameron wrote:
> On Tue, 16 Aug 2022 10:08:28 +0200
> Vincent Whitchurch <vincent.whitchurch@...s.com> wrote:
>
>> If an IIO driver uses callbacks from another IIO driver and calls
>> iio_channel_start_all_cb() from one of its buffer setup ops, then
>> lockdep complains due to the lock nesting, as in the below example with
>> lmp91000. Since the locks are being taken on different IIO devices,
>> there is no actual deadlock, so add lock nesting annotation to silence
>> the spurious warning.
>>
>> ============================================
>> WARNING: possible recursive locking detected
>> 6.0.0-rc1+ #10 Not tainted
>> --------------------------------------------
>> python3/23 is trying to acquire lock:
>> 0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180
>>
>> but task is already holding lock:
>> 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100
>>
>> other info that might help us debug this:
>> Possible unsafe locking scenario:
>>
>> CPU0
>> ----
>> lock(&indio_dev->mlock);
>> lock(&indio_dev->mlock);
>>
>> *** DEADLOCK ***
>>
>> May be due to missing lock nesting notation
>>
>> 5 locks held by python3/23:
>> #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100
>> #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270
>> #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270
>> #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100
>> #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180
>>
>> stack backtrace:
>> CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10
>> Call Trace:
>> dump_stack+0x1a/0x1c
>> __lock_acquire.cold+0x407/0x42d
>> lock_acquire+0x1ed/0x310
>> __mutex_lock+0x72/0xde0
>> mutex_lock_nested+0x1d/0x20
>> iio_update_buffers+0x62/0x180
>> iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb]
>> lmp91000_buffer_postenable+0x1b/0x20 [lmp91000]
>> __iio_update_buffers+0x50b/0xd80
>> enable_store+0x81/0x100
>> dev_attr_store+0xf/0x20
>> sysfs_kf_write+0x4c/0x70
>> kernfs_fop_write_iter+0x179/0x270
>> new_sync_write+0x99/0x120
>> vfs_write+0x2c1/0x470
>> ksys_write+0x67/0x100
>> sys_write+0x10/0x20
>>
>> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@...s.com>
> I'm wondering if this is sufficient.
> At first glance there are a whole bunch of other possible cases of this.
> Any consumer driver that calls iio_device_claim_direct_mode() would be a
> problem - though I'm not sure any do?
>
> I'm not sure I properly understand lockdep notations, but I thought the
> point was we needed to define a hierarchy? To do that here we need
> an IIO driver that is a consumer to somehow let the IIO core know that
> and mark all calls to the locks appropriately. This gets trickier
> as we allow 3+ levels of IIO drivers calling into each other.
>
> We should also think about how to prevent recursion if there are 3
> IIO drivers involved.
There are two different approaches for this kind of nested locking. One
is to use mutex_lock_nested(). This works if there is a strict
hierarchy. The I2C framework for example has a function to determine the
position of a I2C mux in the hierarchy and uses that for locking. See
https://elixir.bootlin.com/linux/latest/source/drivers/i2c/i2c-core-base.c#L1151.
I'm not sure this directly translates to IIO since the
consumers/producers don't have to be a in strict hierarchy. And if it
is a complex graph it can be difficult to figure out the right level for
mutex_lock_nested().
The other method is to mark each mutex as its own class. lockdep does
the lock checking based on the lock class and by default the same mutex
of different instances is considered the same class to keep the resource
requirements for the checker lower.
Regmap for example does this. See
https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regmap.c#L795.
This could be a solution for IIO with the downside how the additional
work for the checker. But as long as there are only a few IIO devices
per system that should be OK. We could also only set the per device lock
class if in kernel consumers are enabled.
Powered by blists - more mailing lists