[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241214180331.1a2600e3@jic23-huawei>
Date: Sat, 14 Dec 2024 18:03:31 +0000
From: Jonathan Cameron <jic23@...nel.org>
To: Per-Daniel Olsson <perdaniel.olsson@...s.com>
Cc: Lars-Peter Clausen <lars@...afoo.de>, Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>, Conor Dooley
<conor+dt@...nel.org>, Javier Carrasco <javier.carrasco.cruz@...il.com>,
linux-iio@...r.kernel.org, linux-kernel@...r.kernel.org,
devicetree@...r.kernel.org, rickard.andersson@...s.com, kernel@...s.com
Subject: Re: [PATCH v8 2/2] iio: light: Add support for TI OPT4060 color
sensor
On Wed, 11 Dec 2024 15:02:23 +0100
Per-Daniel Olsson <perdaniel.olsson@...s.com> wrote:
> Hi Jonathan
>
> I'm sorry for the slow response, I've been a little busy lately.
No problem! I know that feeling all too well.
> Good point
> about the races. I have now been trying to figure out a way to handle this
> issue. Basically the driver is running in three potentially concurrent use
> cases, sysfs read, triggered buffer and threshold events. The hardware has two
> different main parameters to play with, sampling and irq. Sampling can be either
> single shot or continuous. The irq can be set to trigger on each new sample or
> just on thresholds.
> These are the requirements for the different use cases:
>
> Sysfs read:
> - Requires irq on each sample.
> - Requires single shot sampling but works in continuous mode.
>
> Threshold events:
> - Requires irq on thresholds but works with irq on each sample.
> - Requires continuous sampling.
>
> Triggered buffer:
> - Requires irq on each sample.
> - Requires continuous sampling.
>
> Basically this means that all use cases work in high irq and high sampling. The
> potential races occur when lowering irq or sampling without synchronization. My
> plan is to claim direct mode from the sysfs read code and the event code. If the
> call to iio_device_claim_direct_mode() returns EBUSY, the code will continue
> without lowering irq or sampling since it knows that the triggered buffer is
> enabled. The triggered buffer will lower both irq and sampling when being
> disabled in a synchronized way. I will also have to introduce a local mutex to
> protect the potential race between enabling events and sysfs read. That problem
> could happen if the triggered buffer is disabled or events are being changed at
> the same time as a sysfs read is waiting for its sample irq. The other potential
> race is between events and the triggered buffer. That race will be avoided by
> setting the thresh_event_lo_active and thresh_event_hi_active before attempting
> to change irq settings. This is implemented in v9 of the driver.
I think the scheme you've gone for more or less works, but some corners that I'll
call out in review of v9
>
> >> + return 0;
> >> +
> >> + ctrl_reg &= ~OPT4060_CTRL_OPER_MODE_MASK;
> >> + ctrl_reg |= FIELD_PREP(OPT4060_CTRL_OPER_MODE_MASK,
> >> + OPT4060_CTRL_OPER_MODE_ONE_SHOT);
> >> +
> >> + /* Trigger a new conversion by writing to CRTL register. */
> >> + ret = regmap_write(chip->regmap, OPT4060_CTRL, ctrl_reg);
> >> + if (ret)
> >> + dev_err(chip->dev, "Failed to set ctrl register\n");
> >> + return ret;
> >> +}
> >>
> >> +
> >> +static int opt4060_trigger_new_samples(struct iio_dev *indio_dev)
> >> +{
> >> + struct opt4060_chip *chip = iio_priv(indio_dev);
> >> + int ret;
> >> +
> >> + /*
> >> + * The conversion time should be 500us startup time plus the integration time
> >> + * times the number of channels. An exact timeout isn't critical, it's better
> >> + * not to get incorrect errors in the log. Setting the timeout to double the
> >> + * theoretical time plus and extra 100ms margin.
> >> + */
> >> + unsigned int timeout_us = (500 + OPT4060_NUM_CHANS *
> >> + opt4060_int_time_reg[chip->int_time][0]) * 2 + 100000;
> >> +
> >> + if (chip->irq) {
> >> + reinit_completion(&chip->completion);
> >> + ret = opt4060_trigger_one_shot(chip);
> >> + if (ret)
> >> + return ret;
> >> + if (wait_for_completion_timeout(&chip->completion,
> >> + usecs_to_jiffies(timeout_us)) == 0) {
> >> + dev_err(chip->dev, "Completion timed out.\n");
> >> + return -ETIME;
> >> + }
> >> + /*
> >> + * The opt4060_trigger_one_shot() function will enable irq on
> >> + * every conversion. If the buffer isn't enabled, irq should
> >> + * only be enabled for thresholds.
> >> + */
> >> + if (!iio_buffer_enabled(indio_dev)) {
> >
> > This is the race with buffer enable / disable in read_raw() call stack. That transition
> > can be avoided by iio_device_claim_direct_mode(). For this one I don't think we care
> > about that potentially blocking reads via sysfs. That's a common thing anyway
> > for drivers to not support when the buffer is in use because it greatly simplifies the
> > code and normally mixing buffered interface and sysfs reads isn't something anyone does.
>
> With the implementation in v9, I can keep allowing sysfs reads also when the buffer is enabled.
Ok. I can see that it doesn't simplify much to do otherwise.
> >>
> >> +static int opt4060_write_event_config(struct iio_dev *indio_dev,
> >> + const struct iio_chan_spec *chan,
> >> + enum iio_event_type type,
> >> + enum iio_event_direction dir, bool state)
> >> +{
> >> + int ch_sel, ch_idx = chan->scan_index;
> >> + struct opt4060_chip *chip = iio_priv(indio_dev);
> >> + int ret;
> >> +
> >> + if (chan->type != IIO_INTENSITY)
> >> + return -EINVAL;
> >> + if (type != IIO_EV_TYPE_THRESH)
> >> + return -EINVAL;
> >> +
> >> + ret = opt4060_get_channel_sel(chip, &ch_sel);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + if (state) {
> >> + /* Only one channel can be active at the same time */
> >> + if ((chip->thresh_event_lo_active ||
> >> + chip->thresh_event_hi_active) && (ch_idx != ch_sel))
> >> + return -EBUSY;
> >> + if (dir == IIO_EV_DIR_FALLING)
> >> + chip->thresh_event_lo_active = true;
> >> + else if (dir == IIO_EV_DIR_RISING)
> >> + chip->thresh_event_hi_active = true;
> >> + ret = opt4060_set_channel_sel(chip, ch_idx);
> >> + if (ret)
> >> + return ret;
> >> + } else {
> >> + if (ch_idx == ch_sel) {
> >> + if (dir == IIO_EV_DIR_FALLING)
> >> + chip->thresh_event_lo_active = false;
> >> + else if (dir == IIO_EV_DIR_RISING)
> >> + chip->thresh_event_hi_active = false;
> >> + }
> >> + }
> >> +
> > This makes me a little nervous because we are allowing the control of
> > continous mode from here and the buffer setup and not preventing them
> > running at the same time. Maybe there are no races, but I'm not convinced.
> >
> > It is possible to avoid this, but fiddly as we shouldn't directly
> > take the iio_dev->mlock from a driver (which protects the buffer state
> > transitions. Basically we need to successfully pin the device in
> > either direct or buffered mode and if we miss in both (due to a transition
> > in flight) retry.
> >
> > There is one example doing this in the max30102.c
> > https://elixir.bootlin.com/linux/v6.12.1/source/drivers/iio/health/max30102.c#L479
> > (that one is weird because we read the channel in an entirely different way depending
> > on the device mode).
I think you still need this complex dance in at least one place. If nothing
else it makes the code that is protected easier to reason about.
> >
> > I suspect we have other cases missed during review however.
> >
> > The tricky bit is that most races around buffer enable are harmless
> > as they tend to mean we get one extra or one less sample pushed to the
> > buffer, or an read that perturbs the buffered capture timing. So it
> > is fairly hard to spot a real one :(
> >
> > This one is vanishingly unlikely though so I'm fine taking the driver
> > on the basis you'll take another look at close the race if you agree
> > with my analysis. The side effect of this one is that we either
> > burn some power when no one is interested, or fail to enable data capture
> > if we hit the race. Neither is great, but not that bad either.
> >
> > Jonathan
> >
>
> I think I have managed to fix this race in v9.
Review underway.
Thanks,
Jonathan
>
> Best regards / Per-Daniel
>
> >
> >> + return opt4060_event_set_state(indio_dev, chip->thresh_event_hi_active |
> >> + chip->thresh_event_lo_active);
> >> +}
> >
> >
>
Powered by blists - more mailing lists