[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2P1KVLFig0Kh1h2@opensource.cirrus.com>
Date: Thu, 19 Dec 2024 10:27:53 +0000
From: Charles Keepax <ckeepax@...nsource.cirrus.com>
To: Pierre-Louis Bossart <pierre-louis.bossart@...ux.dev>
CC: Richard Fitzgerald <rf@...nsource.cirrus.com>, <vkoul@...nel.org>,
<peter.ujfalusi@...ux.intel.com>, <yung-chuan.liao@...ux.intel.com>,
<sanyog.r.kale@...el.com>, <linux-sound@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <patches@...nsource.cirrus.com>
Subject: Re: [PATCH] soundwire: intel_auxdevice: Don't disable IRQs before
removing children
On Wed, Dec 18, 2024 at 04:40:22PM -0500, Pierre-Louis Bossart wrote:
> I agree that the device should be reachable during the remove(),
> but I believe the scope of expected interaction should be limited
> to a strict minimum. To be clearer, so far not a single device had
> a requirement for any sort of interaction on remove. You would need
> to clarify which codec driver needs this.
Ok I can add that to the commit message.
> I don't see how the 'link_lock' and 'bus_lock' are at different
> levels of the stack, the 'master' device and the 'auxiliary' device
> are both quite thin and I don't quite see what's different between
> the two.
Not sure I follow that, like for example I assume I could just make
the code use the bus_lock instead of adding a lock but to me that
feels like a layer violation, you have a driver directly taking
framework locks. But I guess I don't totally object if everyone
else prefers that.
In terms of other solutions, I could look at redoing more of the
IRQ handling. If rather than using a hard coded linked list, the
IRQs were properly registered as IRQs with the IRQ framework then
we could simply add and remove them using the normal APIs, which
would be a lot cleaner. Of course, fundamentally that is pretty
equivalent but it will then just use a lock from the IRQ framework.
> Having looked at the code in more details, I think there are other issues,
> see e.g. this part of the code called from snd_bus_master_delete().
>
> static int sdw_delete_slave(struct device *dev, void *data)
> {
> struct sdw_slave *slave = dev_to_sdw_dev(dev);
> struct sdw_bus *bus = slave->bus;
>
> pm_runtime_disable(dev);
>
> sdw_slave_debugfs_exit(slave);
>
> mutex_lock(&bus->bus_lock);
>
> if (slave->dev_num) { /* clear dev_num if assigned */
> clear_bit(slave->dev_num, bus->assigned);
> if (bus->ops && bus->ops->put_device_num)
> bus->ops->put_device_num(bus, slave);
> }
>
> So at this point an interaction with the device is not longer possible, even
> if the Cadence interrupts are kept active, since there's no valid device
> number to use...
>
> list_del_init(&slave->node);
> mutex_unlock(&bus->bus_lock);
>
> ... but this is where the .remove() will take place.
>
> device_unregister(dev);
> return 0;
> }
>
> What am I missing?
Hmm... yes that is a good spot, I will investigate that further.
Certainly I do see these operations happening in the wrong order
but it doesn't seem to cause the register transactions to fail.
Most likely it is best to reverse these two, it makes sense
to not clear the device number until we are finished with the
device, but would be good to understand what is going on first.
Thanks,
Charles
Powered by blists - more mailing lists