[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65e16f628245a78da5c9d870d6c5c5a9@gmail.com>
Date: Tue, 19 Nov 2024 12:25:18 +0100
From: Matteo Martelli <matteomartelli3@...il.com>
To: Andy Shevchenko <andriy.shevchenko@...el.com>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>, Jonathan Cameron
<jic23@...nel.org>, Lars-Peter Clausen <lars@...afoo.de>, Michael Hennerich
<Michael.Hennerich@...log.com>, Alisa-Dariana Roman <alisa.roman@...log.com>,
Christian Eggers <ceggers@...i.de>, Peter Rosin <peda@...ntia.se>,
Paul Cercueil <paul@...pouillou.net>, Sebastian Reichel <sre@...nel.org>,
linux-iio@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mips@...r.kernel.org, linux-pm@...r.kernel.org
Subject: Re: [PATCH v5 2/5] iio: consumers: copy/release available info from
producer to fix race
On Mon, 18 Nov 2024 18:05:35 +0200, Andy Shevchenko <andriy.shevchenko@...el.com> wrote:
> On Mon, Nov 18, 2024 at 03:45:25PM +0100, Matteo Martelli wrote:
> > On Mon, 18 Nov 2024 12:21:44 +0200, Andy Shevchenko <andriy.shevchenko@...el.com> wrote:
> > > On Fri, Nov 15, 2024 at 03:25:06PM +0100, Matteo Martelli wrote:
> > > > On Thu, 31 Oct 2024 19:06:32 +0100, Matteo Martelli <matteomartelli3@...il.com> wrote:
> > > > > Quoting Jonathan Cameron (2024-10-31 15:31:29)
> > > > > > On Thu, 31 Oct 2024 12:26:24 +0100
> > > > > > Matteo Martelli <matteomartelli3@...il.com> wrote:
> > > > > > > Quoting Jonathan Cameron (2024-10-30 21:30:50)
> > > > > > > > On Wed, 30 Oct 2024 19:23:21 +0100
> > > > > > > > Matteo Martelli <matteomartelli3@...il.com> wrote:
> > > > > > > > > Quoting Andy Shevchenko (2024-10-30 15:47:50)
> > > > > > > > > > On Mon, Oct 21, 2024 at 02:54:15PM +0200, Matteo Martelli wrote:
> > >
> > > > > > > > > > > Consumers need to call the producer's read_avail_release_resource()
> > > > > > > > > > > callback after reading producer's available info. To avoid a race
> > > > > > > > > > > condition with the producer unregistration, change inkern
> > > > > > > > > > > iio_channel_read_avail() so that it copies the available info from the
> > > > > > > > > > > producer and immediately calls its release callback with info_exists
> > > > > > > > > > > locked.
> > > > > > > > > > >
> > > > > > > > > > > Also, modify the users of iio_read_avail_channel_raw() and
> > > > > > > > > > > iio_read_avail_channel_attribute() to free the copied available buffers
> > > > > > > > > > > after calling these functions. To let users free the copied buffer with
> > > > > > > > > > > a cleanup pattern, also add a iio_read_avail_channel_attr_retvals()
> > > > > > > > > > > consumer helper that is equivalent to iio_read_avail_channel_attribute()
> > > > > > > > > > > but stores the available values in the returned variable.
>
> ...
>
> > > > > > > > > > > +static void dpot_dac_read_avail_release_res(struct iio_dev *indio_dev,
> > > > > > > > > > > + struct iio_chan_spec const *chan,
> > > > > > > > > > > + const int *vals, long mask)
> > > > > > > > > > > +{
> > > > > > > > > > > + kfree(vals);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > static int dpot_dac_write_raw(struct iio_dev *indio_dev,
> > > > > > > > > > > struct iio_chan_spec const *chan,
> > > > > > > > > > > int val, int val2, long mask)
> > > > > > > > > > > @@ -125,6 +132,7 @@ static int dpot_dac_write_raw(struct iio_dev *indio_dev,
> > > > > > > > > > > static const struct iio_info dpot_dac_info = {
> > > > > > > > > > > .read_raw = dpot_dac_read_raw,
> > > > > > > > > > > .read_avail = dpot_dac_read_avail,
> > > > > > > > > > > + .read_avail_release_resource = dpot_dac_read_avail_release_res,
> > > > > > > > > > > .write_raw = dpot_dac_write_raw,
> > > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > I have a problem with this approach. The issue is that we allocate
> > > > > > > > > > memory in one place and must clear it in another. This is not well
> > > > > > > > > > designed thingy in my opinion. I was thinking a bit of the solution and
> > > > > > > > > > at least these two comes to my mind:
> > > > > > > > > >
> > > > > > > > > > 1) having a special callback for .read_avail_with_copy (choose better
> > > > > > > > > > name) that will dump the data to the intermediate buffer and clean it
> > > > > > > > > > after all;
> > > > > > > > > >
> > > > > > > > > > 2) introduce a new type (or bit there), like IIO_AVAIL_LIST_ALLOC.
> > > > > > > > >
> > > > > > > > > Could you elaborate more about these potential solutions? Maybe with some
> > > > > > > > > usage examples?
> > > > > > > > >
> > > > > > > > > If I get it correctly, in both cases you are suggesting to pass ownership
> > > > > > > > > of the vals buffer to the caller, iio_read_channel_info_avail() in this
> > > > > > > > > case, so that it would take care of freeing the buffer after calling
> > > > > > > > > iio_format_after_*(). We considered this approach during an initial
> > > > > > > > > discussion with Jonathan (see read_avail_ext() in [1]), where he suggested
> > > > > > > > > to let the driver keep the release control through a callback for two
> > > > > > > > > reasons:
> > > > > > > > >
> > > > > > > > > 1) Apparently it's a bad pattern to pass the buffer ownership to the core,
> > > > > > > > > maybe Jonathan can elaborate why? The risk I can think of is that the driver
> > > > > > > > > could still keep the buffer copy in its private data after giving it away,
> > > > > > > > > resulting in fact in a double ownership. However I think it would be clear
> > > > > > > > > enough in this case that the copy should be handled by the caller, or maybe
> > > > > > > > > not?
> > > > > > > > Mostly the lack of desire to have to copy for the 95% of cases where it's
> > > > > > > > not needed and that it prevents any optimization like you mention.
> > > > > > >
> > > > > > > I think the suggestion here is to add an additional .read_avail_with_copy()
> > > > > > > without replacing the original .read_avail(), so all the current drivers that
> > > > > > > use a constant avail list would not be affected.
> > >
> > > Yes.
> > >
> > > > > > > And I think this was the same
> > > > > > > idea for the additional read_avail_ext() or the additional argument for the
> > > > > > > read_avail() we were considering in [1]. So I would think that
> > > > > > > iio_read_channel_info_avail() would do something like the following:
> > > > > > >
> > > > > > > if (indio_dev->info->read_avail_with_copy)
> > > > > > > indio_dev->info->read_avail_with_copy(vals);
> > > > > > > else
> > > > > > > indio_dev->info->read_avail(vals);
> > > > > > >
> > > > > > > ...
> > > > > > > iio_format_avail_list(vals);
> > > > > > > ...
> > > > > > >
> > > > > > > if (indio_dev->info->read_avail_with_copy)
> > > > > > > kfree(vals);
> > >
> > > Right. At least that's what I see can be done with the existing users.
> > >
> > > > > > Ok, sure that would work, but...
> > > > > >
> > > > > > I don't really see this as being much less fragile than
> > > > > > the existing solution + in cases that we do have where
> > > > > > only some available are not const we will have to copy them
> > > > > > all.
> > > > > >
> > > > > > If anything it's more complex than making it a driver problem
> > > > > > to provide the release call however it wants to do it.
> > >
> > > ...but make a driver to allocate what's needed as well then.
> > >
> > > > > > > And the drivers would choose whether to define the read_avail or the
> > > > > > > read_avail_with_copy.
> > >
> > > Either way drivers should know what to do with a data supplied to read_aval().
> > > In one case we assume the [simple] workflow in the core, in the other we all
> > > rely on the driver. Current approach makes a mix of these two. And that's what
> > > I don't like.
> >
> > If I understand your concern correctly, you are referring to the inkern
> > iio_channel_read_avail() that makes the allocation for the consumer's
> > buffer copy and you are suggesting that such copy should be done by the
> > consumer driver code itself, this to be consistent with the producer
> > drivers which directly handle the allocation of the copy.
>
> One of the options, yes.
>
> > One thing to notice is that the inkern iio_channel_read_avail() does
> > together producer->read_avail() + copy + producer->read_avail_release()
> > with info_exists locked. Also, the consumer driver would need to know
> > the avail buffer size to allocate the buffer copy prior the
> > iio_channel_read_avail() call, but such size is unknown before calling
> > the actual producer's read_avail(). This would mean calling the
> > producer's read_avail() and read_avail_release() callbacks separately
> > without the lock held, with the risk of a memleak if the producer is
> > unregistered between those calls.
>
> Thanks for explaining this, but it even more makes me think that the design
> is broken and your approach is rather a hack. So, what's the problem to
> make IIO core to take care of the allocating and cleaning then without driver
> being involved? Yes, this might require a hint from the driver on what to copy
> if we want to avoid copying everything.
>
I am not particularly against it, other than the concerns that have
emerged during this (and previous) discussion. Let me summarize them:
fix-1) the current one. Your concerns are:
* for consumers the copy allocation is taken care by the inkern API
but the release is handled by the consumer driver code, making it
a fragile design.
* consumers and producers manage the allocation differently, the
first handles it via the inkern API, the second one in the
producer driver code, making it inconsistent.
fix-2) adding a read_avail_with_copy(): a driver with both const avail
lists and mutable avail lists would always return a copy for all of
them, including the const ones. Example above.
fix-3) adding a release_avail return param to read_avail(): this would
require a change to all the drivers using it. Also it
looks to me an unusual pattern, are there other similar patterns around
the codebase? Example below.
fix-4) adding a new enum variant to the avail type like
IIO_AVAIL_LIST_ALLOC: to me this looks hacky as it mixes the logic type
of the data structure and how it is handled in memory. I think the
latter should better fit in a different field, however this modification
would have little impact in the current code. Example below.
So far these alternatives only consider moving the release of the copy
buffer in the IIO core but not its allocation. You also suggest to make
the IIO core take care of the copy allocation. The problem I see with
this is that if the copy is handled outside the driver it could take
place concurrently with the modification of the original buffer since it
would not be locked by driver private mutex, thus making the copy
useless. This might be worked around by adding an additional optional
callback (e.g. read_avail_will_copy/read_avail_is_mutable) to just take
the size and check if a copy will be provided, so maybe something like:
fix-5) iio_read_channel_info_avail():
{
...
int *vals;
bool copy = false;
if (indio_dev->info->read_avail_will_copy) {
copy = indio_dev->info->read_avail_will_copy(..., &length, ...);
if (copy) {
vals = kcalloc(length, sizeof(int), GFP_KERNEL);
}
}
indio_dev->info->read_avail(&vals, ...);
if (ret < 0)
return ret;
switch (ret) {
case IIO_AVAIL_LIST:
ret = iio_format_avail_list(buf, vals, type, length);
case IIO_AVAIL_RANGE:
ret = iio_format_avail_range(buf, vals, type);
default:
ret = -EINVAL;
}
if (copy)
kfree(vals);
}
If I am not missing anything this could work and maybe it could also
avoid the double copy on the consumers but would require all of them to
wrap the read_avail_will_copy(). Also, I find it quite a weird pattern
that in some cases vals would be an input buffer to be filled and in
other cases it would be a return argument pointing to the const buffer
stored inside the driver. At least I wouldn't say it's more robust than
the current fix-1.
All these alternatives also prevents some potential optimization already
mentioned before. Reporting it again as it is now lost in the mess below:
Some driver might want to avoid allocating a new copy of a big table if
the race does not occur (e.g. with additional checks on buffer access
code) and thus wouldn't call a free() in the release callback.
In the end I don't find any of the above alternatives to provide an
obvious better solution.
> > > > > > > What I was referring to is that, back then, you mentioned you would have
> > > > > > > preferred to avoid passing ownership of the buffer around:
> > > > > > >
> > > > > > > > That's a corner case we should think about closing. Would require an indicator
> > > > > > > > to read_avail that the buffer it has been passed is a snapshot that it should
> > > > > > > > free on completion of the string building. I don't like passing ownership
> > > > > > > > of data around like that, but it is fiddly to do anything else given
> > > > > > > > any simple double buffering is subject to race conditions.
> > > > > > >
> > > > > > > I guess there is some other reason other than avoiding the copy when not
> > > > > > > necessary, since by introducing an additional function or argument or return
> > > > > > > type, most of the unnecessary copies would already be avoided right?
> > > > > >
> > > > > > It's not a strong reason beyond limiting scope of clever design +
> > > > > > the key bit my mind is that the above is not substantially simpler and
> > > > > > reduces our flexibility.
> > > > > >
> > > > > > > Anyway any of this solutions would still prevent the potential optimizations of
> > > > > > > point 2). It's worth mentioning that those kind of optimizations are currently
> > > > > > > not adopted by any driver.
> > > > > >
> > > > > > That one indeed not, but mixing dynamic and non dynamic is something
> > > > > > you do in your pac1921 patch.
> > > > >
> > > > > Good point! I didn't think about it, or more likely I forgot, that with an
> > > > > additional read_avail_with_copy() used as in the example you cannot mix dynamic
> > > > > and non dynamic available lists, thus those drivers that need at least one
> > > > > dynamic available list would always copy all of them as they need to rely to
> > > > > the read_avail_with_copy(). I guess this could be worked around with an
> > > > > additional return argument for the read_avail() or an additional type like the
> > > > > IIO_AVAIL_LIST_ALLOC suggested by Andy to signal the caller it needs to free
> > > > > the list after use. However, I think they would introduce a more invasive
> > > > > change in the current API compared to an additional optional callback,
> > >
> > > It even sounds originally that it should be more invasive, so I don't think it's
> > > a problem here.
> >
> > In the hope it helps the discussion let me provide examples for the
> > additional two options we have other than the current
> > read_avail_release_resource() (fix-1) and the read_avail_with_copy()
> > (fix-2) already shown above:
>
> Thanks!
>
> > fix-3) iio_read_channel_info_avail():
> > {
> > ...
> > bool release_avail = false;
> >
> > ret = indio_dev->info->read_avail(vals, ..., &release_avail);
> >
> > ...
> > ret = iio_format_avail_list(vals, ...);
> > ...
> >
> > if (release_avail)
> > kfree(vals);
> >
> > return ret;
> > }
> >
> >
> > fix-4) iio_read_channel_info_avail():
> > {
> > ...
> > indio_dev->info->read_avail(vals, ...);
> >
> > if (ret < 0)
> > return ret;
> > switch (ret) {
> > case IIO_AVAIL_LIST_ALLOC:
> > ret = iio_format_avail_list(buf, vals, type, length);
> > kfree(vals);
> > return ret;
> > case IIO_AVAIL_LIST:
> > return iio_format_avail_list(buf, vals, type, length);
> > case IIO_AVAIL_RANGE:
> > return iio_format_avail_range(buf, vals, type);
> > default:
> > return -EINVAL;
> > }
> >
> > }
> >
> > > > > so I agree that the current release callback is still a better option.
> > >
> > > I disagree on this as I pointed above why.
> > >
> > > > > > > > > 2) Some driver might want to avoid allocating a new copy of a big table if
> > > > > > > > > the race does not occur (e.g. with additional checks on buffer access
> > > > > > > > > code) and thus wouldn't call a free() in the release callback.
> > > > > > > > >
> > > > > > > > > > In any case it looks fragile and not scalable. I propose to drop this
> > > > > > > > > > and think again.
> > > > > > > > >
> > > > > > > > > I see your concerns, I am open to reconsider this in case we come up with
> > > > > > > > > better solution after addressing the points above.
> > > > > > > > >
> > > > > > > > > > Yes, yes, I'm fully aware about the problem you are trying to solve and
> > > > > > > > > > agree on the report, I think this solution is not good enough.
> > > > > > > > >
> > > > > > > > > [1]: https://lore.kernel.org/linux-iio/20240729211100.0d602d6e@jic23-huawei/
> > > > > > >
> > > > > > > I hope I've brought a little more clarity to the discussion by providing some
> > > > > > > history instead of making it more confusing.
> > > > > >
> > > > > > Sure, the code example in particular is useful.
> > > >
> > > > Just a friendly reminder this has been sitting for a while, any news or
> > > > additional considerations?
> > >
> > > Moving the allocation control to the drivers will satisfy me as well, however
> > > it makes even more duplication of the code, but at least it will be cleaner
> > > design-wise in my opinion.
> >
> > Would it work with the constraints on the info_exists lock mentioned
> > above?
>
> None of the given examples (fix-N) provides a lock, so I have no clue how it's
> involved here. May be you can elaborate more?
>
I thought that with "Moving the allocation control to the drivers" you
were referring to the option (not included among fix-N) to move the
allocation of the consumer copy from the inkern iio_channel_read_avail()
to the consumer drivers themselves. You elaborated this point above
where I answered with the concerns about the info_exists lock that
should be addressed.
> > > In any case the last word is on Jonathan.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
Best regards,
Matteo Martelli
Powered by blists - more mailing lists