lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKbEzntM7CvEtNT-SP_CFY0AhTabiDg5JGFdsM=BUd6K=UvCUw@mail.gmail.com>
Date: Thu, 29 May 2025 22:34:48 +0900
From: Gyeyoung Baek <gye976@...il.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Andy Shevchenko <andy.shevchenko@...il.com>, Jonathan Cameron <jic23@...nel.org>, 
	David Lechner <dlechner@...libre.com>, Nuno Sá <nuno.sa@...log.com>, 
	Andy Shevchenko <andy@...nel.org>, linux-iio@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iio: trigger: Avoid data race

On Thu, May 29, 2025 at 2:02 AM Jonathan Cameron
<Jonathan.Cameron@...wei.com> wrote:
>
> On Wed, 28 May 2025 16:17:06 +0900
> Gyeyoung Baek <gye976@...il.com> wrote:
>
> > On Wed, May 28, 2025 at 6:19 AM Andy Shevchenko
> > <andy.shevchenko@...il.com> wrote:
> > >
> > > On Tue, May 27, 2025 at 11:10 PM Gyeyoung Baek <gye976@...il.com> wrote:
> > > > On Wed, May 28, 2025 at 5:25 AM Andy Shevchenko
> > > > <andy.shevchenko@...il.com> wrote:
> > > > > On Tue, May 27, 2025 at 10:05 PM Gyeyoung Baek <gye976@...il.com> wrote:
> > >
> > > ...
> > >
> > > > > At bare minimum they are not relevant to the patch change and haven't
> > > > > been described in the commit messages.
> > > >
> > > > Hi Andy, thanks for your review.
> > > > I initially skipped this part as I thought it was minor.
> > > > But on a second look, it seems better to separate the declaration from
> > > > the logic.
> > > >
> > > > What do you think about the data race logic? Would it make sense?
> > >
> > > The point is valid, the atomic_read() + atomic_set() is 101 thingy,
> > > whoever did that doesn't really have a clue what atomic(ity) is.
>
> :)
>
> I'm trying to recall what this protection is actually for so this might
> be a little vague as descriptions go...
>
> The key here is what can happen in that 'race' and hence why I'm still fairly
> sure it isn't a real race.  Firstly this is called in an irq handler
> so we can only have one call of this particular function at a time
> for a given trigger.  So no race against itself.
>
> The atomic part is about decrements that can happen elsewhere, but there
> can never be 'more' decrements than the value we set the counter to in this
> function.  That is it never goes negative.
>
> Those decrements ultimately happen in calls that can't happen until after
> the set (via various convoluted paths ultimately getting to
> iio_trigger_notify_done()).  In many cases the trigger is masked until it
> is reenabled on the counter == 0 (elsewhere) - but not always...
>
> IIRC correctly aim is to not double trigger in cases where we can't mask
> the trigger (a particularly rubbish trigger) - so if any of the downstream
> devices still hasn't called iio_trigger_notify_done() then we quietly
> drop this particular irq on the floor. We don't mind dropping a few
> too many, just dropping too few a then we end up loosing count of who
> has to be 'done' with the trigger.
>
> Hence the counter won't change between atomic_get and the atomic_set
> as it's always 0 which means no one else is left to decrement it.
>
> Atomics don't always need to be atomic all the time, they just are in
> some states.
>
> So, is this something that has caused observed problems, or based
> on code inspection? My remembering of what was going on here might well
> be flawed.

based on code inspection.
initially, I simply assumed a data race because `atomic_read()` and
`atomic_set()` were used.
I didn’t question it further, sorry for that...
However, after reading Jonathan's review, I take a look at the previous commits.
It now seems that there is no data race.

(I wonder why there isn't a wrapper API which does something like
`atomic->counter = val;` for situations like this,
    where only consumers need atomic API but not producers?)

Since synchronization isn't needed here, I think `cmpxchg()` may not
be appropriate.
I considered a few possible ways to improve clarity:

1. Add comments only.
2. Add comments and access directly like `use_count->counter =
CONFIG_IIO_CONSUMERS_PER_TRIGGER;`.
    (Wouldn't it make sense to have an official API for such direct
access in `linux/atomic/~~.h`?)
3. Add a separate bool flag to represent trigger's on/off state.
`iio_trigger_notify_done()` still uses an atomic API, and it sets a
boolean flag when count is 0.
Then `poll()` and `poll_nested()` would simply check the flag without
using atomic API.

I think `3.` seems the best.
I would appreciate your reviews on this.

> There are some 'fun' corners for what happens after that set though
> where a handler can run fast enough in race conditions we end up
> hitting 0 in iio_trigger_notify_done_atomic() and have to schedule
> restarting of the trigger because that might involve a bus write over
> a sleeping bus.  That one was a painful bug report some years ago...
>
> Jonathan
>
> >
> > Thanks for your explanation.
> > Then I’ll send a v2 patch with only the `int i` change, following the
> > review feedback.
> >
> > --
> > Best regards,
> > Gyeyoung
> >
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ