linux-kernel - Re: [PATCH] iio: buffer: Silence lock nesting splat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ff2bc13c-f66f-03f3-fc01-c4f962f7b694@metafoo.de>
Date:   Sat, 20 Aug 2022 13:08:28 +0200
From:   Lars-Peter Clausen <lars@...afoo.de>
To:     Jonathan Cameron <jic23@...nel.org>,
        Vincent Whitchurch <vincent.whitchurch@...s.com>
Cc:     kernel@...s.com, linux-iio@...r.kernel.org,
        linux-kernel@...r.kernel.org, Peter Rosin <peda@...ntia.se>
Subject: Re: [PATCH] iio: buffer: Silence lock nesting splat

On 8/20/22 13:06, Jonathan Cameron wrote:
> On Tue, 16 Aug 2022 10:08:28 +0200
> Vincent Whitchurch <vincent.whitchurch@...s.com> wrote:
>
>> If an IIO driver uses callbacks from another IIO driver and calls
>> iio_channel_start_all_cb() from one of its buffer setup ops, then
>> lockdep complains due to the lock nesting, as in the below example with
>> lmp91000.  Since the locks are being taken on different IIO devices,
>> there is no actual deadlock, so add lock nesting annotation to silence
>> the spurious warning.
>>
>>   ============================================
>>   WARNING: possible recursive locking detected
>>   6.0.0-rc1+ #10 Not tainted
>>   --------------------------------------------
>>   python3/23 is trying to acquire lock:
>>   0000000064c944c0 (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers+0x62/0x180
>>
>>   but task is already holding lock:
>>   00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100
>>
>>   other info that might help us debug this:
>>    Possible unsafe locking scenario:
>>
>>          CPU0
>>          ----
>>     lock(&indio_dev->mlock);
>>     lock(&indio_dev->mlock);
>>
>>    *** DEADLOCK ***
>>
>>    May be due to missing lock nesting notation
>>
>>   5 locks held by python3/23:
>>    #0: 00000000636b5420 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x67/0x100
>>    #1: 0000000064c19280 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x13a/0x270
>>    #2: 0000000064c3d9e0 (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x149/0x270
>>    #3: 00000000636b64c0 (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store+0x4d/0x100
>>    #4: 0000000064c945c8 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers+0x4f/0x180
>>
>>   stack backtrace:
>>   CPU: 0 PID: 23 Comm: python3 Not tainted 6.0.0-rc1+ #10
>>   Call Trace:
>>    dump_stack+0x1a/0x1c
>>    __lock_acquire.cold+0x407/0x42d
>>    lock_acquire+0x1ed/0x310
>>    __mutex_lock+0x72/0xde0
>>    mutex_lock_nested+0x1d/0x20
>>    iio_update_buffers+0x62/0x180
>>    iio_channel_start_all_cb+0x1c/0x20 [industrialio_buffer_cb]
>>    lmp91000_buffer_postenable+0x1b/0x20 [lmp91000]
>>    __iio_update_buffers+0x50b/0xd80
>>    enable_store+0x81/0x100
>>    dev_attr_store+0xf/0x20
>>    sysfs_kf_write+0x4c/0x70
>>    kernfs_fop_write_iter+0x179/0x270
>>    new_sync_write+0x99/0x120
>>    vfs_write+0x2c1/0x470
>>    ksys_write+0x67/0x100
>>    sys_write+0x10/0x20
>>
>> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@...s.com>
> I'm wondering if this is sufficient.
> At first glance there are a whole bunch of other possible cases of this.
> Any consumer driver that calls iio_device_claim_direct_mode() would be a
> problem - though I'm not sure any do?
>
> I'm not sure I properly understand lockdep notations, but I thought the
> point was we needed to define a hierarchy?  To do that here we need
> an IIO driver that is a consumer to somehow let the IIO core know that
> and mark all calls to the locks appropriately.  This gets trickier
> as we allow 3+ levels of IIO drivers calling into each other.
>
> We should also think about how to prevent recursion if there are 3
> IIO drivers involved.

There are two different approaches for this kind of nested locking. One 
is to use mutex_lock_nested(). This works if there is a strict 
hierarchy. The I2C framework for example has a function to determine the 
position of a I2C mux in the hierarchy and uses that for locking. See 
https://elixir.bootlin.com/linux/latest/source/drivers/i2c/i2c-core-base.c#L1151.

I'm not sure this directly translates to IIO since the 
consumers/producers don't have to be a in strict hierarchy.  And if it 
is a complex graph it can be difficult to figure out the right level for 
mutex_lock_nested().

The other method is to mark each mutex as its own class. lockdep does 
the lock checking based on the lock class and by default the same mutex 
of different instances is considered the same class to keep the resource 
requirements for the checker lower.

Regmap for example does this. See 
https://elixir.bootlin.com/linux/latest/source/drivers/base/regmap/regmap.c#L795.

This could be a solution for IIO with the downside how the additional 
work for the checker. But as long as there are only a few IIO devices 
per system that should be OK. We could also only set the per device lock 
class if in kernel consumers are enabled.