lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240413173409.63d33a0a@jic23-huawei>
Date: Sat, 13 Apr 2024 17:34:09 +0100
From: Jonathan Cameron <jic23@...nel.org>
To: David Lechner <dlechner@...libre.com>
Cc: Marcelo Schmitt <marcelo.schmitt1@...il.com>, Marcelo Schmitt
 <marcelo.schmitt@...log.com>, lars@...afoo.de,
 Michael.Hennerich@...log.com, robh+dt@...nel.org,
 krzysztof.kozlowski+dt@...aro.org, conor+dt@...nel.org,
 linux-iio@...r.kernel.org, devicetree@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] iio: adc: Add support for AD4000

On Tue, 9 Apr 2024 11:44:26 -0500
David Lechner <dlechner@...libre.com> wrote:

> On Tue, Apr 9, 2024 at 11:09 AM Marcelo Schmitt
> <marcelo.schmitt1@...il.com> wrote:
> >
> > On 04/08, David Lechner wrote:  
> > > On Mon, Apr 8, 2024 at 9:32 AM Marcelo Schmitt
> > > <marcelo.schmitt@...log.com> wrote:  
> > > >  
> 
> ...
> 
> > >
> > > I also still have doubts about using IIO_BE and 8-bit xfers when it
> > > comes to adding support later to achieve max sample rate with a SPI
> > > offload. For example to get 2MSPS with an 18-bit chip, it will require
> > > an approx 33% faster SPI clock than the actual slowest clock possible
> > > because it will have to read 6 extra bits per sample. I didn't check
> > > the specs, but this may not even be physically possible without
> > > exceeding the datasheet max SPI clock rate. Also errors could be
> > > reduced if we could actually use the slowest allowable SPI clock rate.
> > > Furthermore, the offload hardware would have to be capable of adding
> > > an extra byte per sample for 18 and 20-bit chips when piping the data
> > > to DMA in order to get the 32-bit alignment in the buffer required by
> > > IIO scan_type and the natural alignment requirements of IIO buffers in
> > > general.  
> >
> > Maybe I should already implement support for reading with SPI offload
> > rather than doing it after this set is merged?
> > So we can test what happens at faster sample rates before we commit to a solution.
> >  
> 
> Yes, that sounds like a wise thing to do.
> 
> >  
> > >  
> > > > +               } data;
> > > > +               s64 timestamp __aligned(8);
> > > > +       } scan;
> > > > +       __be16 tx_buf __aligned(IIO_DMA_MINALIGN);
> > > > +       __be16 rx_buf;
> > > > +};  
> > >
> > > scan.data is used as SPI rx_buf so __aligned(IIO_DMA_MINALIGN); needs
> > > to be moved to the scan field.  
> >
> > I have already tried it. Maybe I did something wrong besides buffer alignment
> > at that time. Will give it another try.  
> 
> This is the alignment for DMA cache coherency. So it should not have
> any affect on the actual data read, only performance.

Nope. It's a correctness issue not a performance one (though you may get
advantages there as well)  You can get potential corruption of
other fields that end up in the same cacheline - so the aim is to make
sure that nothing that we might use concurrently is in that cacheline.
 
There was a good description of what is going on here in a talk Wolfram
gave a few years back when he was exploring how to avoid bounce buffers
in I2C https://www.youtube.com/watch?v=JDwaMClvV-s that includes links to descriptions
of the fun that can happen.  The short description is that a DMA controller is
allowed to grab the whole of a cacheline (typically 64 bytes, can be bigger)
in coherently from the host (basically takes a copy).  It can then merrily
do it's operations before finally copying it back to the actual memory.
The problem lies in that there may be other data in that cacheline that
is accessed at whilst the DMA controller was working on it's own prviate
copy.  Those changes will be wiped out.

Now you probably didn't see it because:
a) Many controllers don't do this - either they don't cache stale data, or
   are sufficiently coherent with CPU caches etc that any changes in this
   'near by' data are correctly handled.

b) It's really hard to hit the races anyway. I've only seen it once when
   debugging real systems, but people run into this occasionally on IIO
   drivers and it is really nasty to debug.

c) On arm64 at least in many cases the DMA core will bounce buffer anyway
   after some changes made a couple of years back.  Unfortunately that isn't
   true on all architectures yet (I think anyway) so we still need to be
   careful around this.

Note some architectures (x86 I think) never allowed this cacheline corruption
path in the first place so the DMA_MINALIGN value is 8 bytes (IIRC).

Coherency is hard (but fun if you have time, particularly the places where
it is allowed to break and how they are avoided :)

Jonathan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ