[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHp75VdBMXCBYiDX8ocyOZBm_dhWJSU_NXJN6PmOwsZaJt=AHw@mail.gmail.com>
Date: Thu, 29 May 2025 07:42:46 +0200
From: Andy Shevchenko <andy.shevchenko@...il.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Gyeyoung Baek <gye976@...il.com>, Jonathan Cameron <jic23@...nel.org>,
David Lechner <dlechner@...libre.com>, Nuno Sá <nuno.sa@...log.com>,
Andy Shevchenko <andy@...nel.org>, linux-iio@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iio: trigger: Avoid data race
On Wed, May 28, 2025 at 7:02 PM Jonathan Cameron
<Jonathan.Cameron@...wei.com> wrote:
> On Wed, 28 May 2025 16:17:06 +0900
> Gyeyoung Baek <gye976@...il.com> wrote:
> > On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
> > <andy.shevchenko@...il.com> wrote:
> > > On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@...il.com> wrote:
> > > > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > > > <andy.shevchenko@...il.com> wrote:
> > > > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@...il.com> wrote:
...
> > > > > At bare minimum they are not relevant to the patch change and haven't
> > > > > been described in the commit messages.
> > > >
> > > > I initially skipped this part as I thought it was minor.
> > > > But on a second look, it seems better to separate the declaration from
> > > > the logic.
> > > >
> > > > What do you think about the data race logic? Would it make sense?
> > >
> > > The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> > > whoever did that doesn't really have a clue what atomic(ity) is.
>
> :)
>
> I'm trying to recall what this protection is actually for so this might
> be a little vague as descriptions go...
>
> The key here is what can happen in that 'race' and hence why I'm still fairly
> sure it isn't a real race. Firstly this is called in an irq handler
> so we can only have one call of this particular function at a time
> for a given trigger. So no race against itself.
>
> The atomic part is about decrements that can happen elsewhere, but there
> can never be 'more' decrements than the value we set the counter to in this
> function. That is it never goes negative.
>
> Those decrements ultimately happen in calls that can't happen until after
> the set (via various convoluted paths ultimately getting to
> iio_trigger_notify_done()). In many cases the trigger is masked until it
> is reenabled on the counter == 0 (elsewhere) - but not always...
>
> IIRC correctly aim is to not double trigger in cases where we can't mask
> the trigger (a particularly rubbish trigger) - so if any of the downstream
> devices still hasn't called iio_trigger_notify_done() then we quietly
> drop this particular irq on the floor. We don't mind dropping a few
> too many, just dropping too few a then we end up loosing count of who
> has to be 'done' with the trigger.
>
> Hence the counter won't change between atomic_get and the atomic_set
> as it's always 0 which means no one else is left to decrement it.
>
> Atomics don't always need to be atomic all the time, they just are in
> some states.
Yes, but this is confusing and/or buggy code to begin with (and
independently on the background of its usage). Technically this patch
(perhaps with a better commit message) is valid. Main two points here:
1) avoiding potential (even theoretical) race; 2) do not spread a
really wrong pattern to avoid people learning from it (in case
somebody takes this code as an example for a new driver which might
reside outside of IIO and hence might not be caught by the reviewers
of _that_ subsystem).
Alternatively we need to get rid of atomic operations, but with the
same effect as 2) above.
> So, is this something that has caused observed problems, or based
> on code inspection? My remembering of what was going on here might well
> be flawed.
>
> There are some 'fun' corners for what happens after that set though
> where a handler can run fast enough in race conditions we end up
> hitting 0 in iio_trigger_notify_done_atomic() and have to schedule
> restarting of the trigger because that might involve a bus write over
> a sleeping bus. That one was a painful bug report some years ago...
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists