lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250701135747.mns6emamtmxwgpyu@skbuf>
Date: Tue, 1 Jul 2025 16:57:47 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: James Clark <james.clark@...aro.org>
Cc: Vladimir Oltean <olteanv@...il.com>, Mark Brown <broonie@...nel.org>,
	Arnd Bergmann <arnd@...db.de>,
	Larisa Grigore <larisa.grigore@....com>,
	Frank Li <Frank.li@....com>, Christoph Hellwig <hch@....de>,
	linux-spi@...r.kernel.org, imx@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/6] spi: spi-fsl-dspi: Target mode improvements

On Tue, Jul 01, 2025 at 01:42:46PM +0100, James Clark wrote:
> I wonder if latency could be higher despite increased throughput? It
> probably wouldn't be a big enough increase that anyone would care. And based
> on the structure of the driver if throughput is higher the latency might
> even be lower.

Actually, I do have a metric for that, sort of. I have a SPI-controlled
Ethernet switch with support for IEEE 1588, and synchronizing its
hardware clock over SPI benefits greatly from having a high precision
software timestamping point for the SPI transfers themselves.

Essentially, with XSPI FIFO mode we are able to provide a timestamping
granularity of $(FIFO size) words, see the spi_take_timestamp_pre() and
spi_take_timestamp_post() calls. Whereas with DMA, we let the core take
a message-level software timestamp which is much coarser, because at
driver level we can't guarantee a much more precise transmission time
interval for a particular requested byte. See __spi_pump_transfer_message().

If you're not familiar with phc2sys, an interpretation of the logs below
is as follows.

phc2sys synchronizes the sw2p0 (target) clock to CLOCK_REALTIME (the
source clock). "delay" is the time it took for the kernel to read the
target clock once, and the system clock twice (before and after).
When software timestamps the SPI transfer that reads the hardware time,
this is called a "cross timestamp". The smaller and less jittery this
delay, the more stable the cross-timestamp and the better will software
be able to discipline the target clock (aka the smaller the offsets will
be).

Before:

$ phc2sys -s CLOCK_REALTIME -c sw2p0 -O 0 -m
phc2sys[38.432]: sw2p0 sys offset -1741272972548124929 s0 freq      +0 delay   6720
phc2sys[39.434]: sw2p0 sys offset -1741272972548179141 s1 freq  -54094 delay   5960
phc2sys[40.436]: sw2p0 sys offset       190 s2 freq  -53904 delay   6001
phc2sys[41.437]: sw2p0 sys offset       306 s2 freq  -53731 delay   6520
phc2sys[42.438]: sw2p0 sys offset       275 s2 freq  -53670 delay   6401
phc2sys[43.441]: sw2p0 sys offset       350 s2 freq  -53513 delay   6881
phc2sys[44.442]: sw2p0 sys offset      -302 s2 freq  -54060 delay   6321
phc2sys[45.444]: sw2p0 sys offset        35 s2 freq  -53814 delay   6761
phc2sys[46.446]: sw2p0 sys offset      -103 s2 freq  -53941 delay   6481
phc2sys[47.447]: sw2p0 sys offset       -43 s2 freq  -53912 delay   6361
phc2sys[48.450]: sw2p0 sys offset       314 s2 freq  -53568 delay   6960
phc2sys[49.451]: sw2p0 sys offset      -310 s2 freq  -54098 delay   6441
phc2sys[50.453]: sw2p0 sys offset       -86 s2 freq  -53967 delay   6321
phc2sys[51.455]: sw2p0 sys offset        -5 s2 freq  -53911 delay   6401
phc2sys[52.457]: sw2p0 sys offset        -2 s2 freq  -53910 delay   6320
phc2sys[53.458]: sw2p0 sys offset        77 s2 freq  -53832 delay   6400
phc2sys[54.459]: sw2p0 sys offset      -112 s2 freq  -53997 delay   6240
phc2sys[55.461]: sw2p0 sys offset        66 s2 freq  -53853 delay   6480
phc2sys[56.463]: sw2p0 sys offset       -33 s2 freq  -53932 delay   6441
phc2sys[57.465]: sw2p0 sys offset       -33 s2 freq  -53942 delay   6441
phc2sys[58.467]: sw2p0 sys offset        17 s2 freq  -53902 delay   6440
phc2sys[59.468]: sw2p0 sys offset       -14 s2 freq  -53928 delay   6520
phc2sys[60.470]: sw2p0 sys offset      -133 s2 freq  -54051 delay   6281
phc2sys[61.472]: sw2p0 sys offset         8 s2 freq  -53950 delay   6400
phc2sys[62.473]: sw2p0 sys offset        25 s2 freq  -53931 delay   6400
phc2sys[63.474]: sw2p0 sys offset      -113 s2 freq  -54061 delay   6040
phc2sys[64.476]: sw2p0 sys offset        44 s2 freq  -53938 delay   6281
phc2sys[65.477]: sw2p0 sys offset       -17 s2 freq  -53986 delay   6320
phc2sys[66.479]: sw2p0 sys offset       -86 s2 freq  -54060 delay   5841
phc2sys[67.480]: sw2p0 sys offset       141 s2 freq  -53859 delay   6361
phc2sys[68.481]: sw2p0 sys offset       -11 s2 freq  -53968 delay   6320
phc2sys[69.483]: sw2p0 sys offset       -15 s2 freq  -53976 delay   6321
phc2sys[70.484]: sw2p0 sys offset      -109 s2 freq  -54074 delay   5960
phc2sys[71.486]: sw2p0 sys offset       115 s2 freq  -53883 delay   6520
phc2sys[72.488]: sw2p0 sys offset       -86 s2 freq  -54049 delay   6280
phc2sys[73.489]: sw2p0 sys offset       234 s2 freq  -53755 delay   6801
phc2sys[74.491]: sw2p0 sys offset      -219 s2 freq  -54138 delay   6361
^Cphc2sys[74.923]: sw2p0 sys offset      -174 s2 freq  -54159 delay   6360

After:

$ phc2sys -s CLOCK_REALTIME -c sw2p0 -O 0 -m
phc2sys[753.479]: sw2p0 sys offset 1882248595 s0 freq +32000000 delay 150440
phc2sys[754.482]: sw2p0 sys offset 1850232103 s1 freq  +46787 delay 141960
phc2sys[755.483]: sw2p0 sys offset    -33278 s2 freq  +13509 delay 143160
phc2sys[756.485]: sw2p0 sys offset     -5074 s2 freq  +31729 delay 150040
phc2sys[757.486]: sw2p0 sys offset     11060 s2 freq  +46341 delay 140240
phc2sys[758.488]: sw2p0 sys offset      4804 s2 freq  +43403 delay 151320
phc2sys[759.489]: sw2p0 sys offset     10358 s2 freq  +50398 delay 141879
phc2sys[760.491]: sw2p0 sys offset       409 s2 freq  +43557 delay 148840
phc2sys[761.493]: sw2p0 sys offset      3863 s2 freq  +47133 delay 143360
phc2sys[762.494]: sw2p0 sys offset       259 s2 freq  +44688 delay 145840
phc2sys[763.496]: sw2p0 sys offset      1849 s2 freq  +46356 delay 141000
phc2sys[764.497]: sw2p0 sys offset     -1800 s2 freq  +43262 delay 144160
phc2sys[765.499]: sw2p0 sys offset      -184 s2 freq  +44338 delay 139880
phc2sys[766.501]: sw2p0 sys offset     -1677 s2 freq  +42790 delay 146120
phc2sys[767.502]: sw2p0 sys offset      2529 s2 freq  +46492 delay 141040
phc2sys[768.504]: sw2p0 sys offset     -4368 s2 freq  +40354 delay 151240
phc2sys[769.505]: sw2p0 sys offset      1112 s2 freq  +44524 delay 147680
phc2sys[770.507]: sw2p0 sys offset      3002 s2 freq  +46747 delay 142960
phc2sys[771.509]: sw2p0 sys offset      -899 s2 freq  +43747 delay 145440
phc2sys[772.510]: sw2p0 sys offset     -2003 s2 freq  +42373 delay 148360
phc2sys[773.512]: sw2p0 sys offset      3675 s2 freq  +47450 delay 141440
phc2sys[774.514]: sw2p0 sys offset     -1417 s2 freq  +43461 delay 144960
phc2sys[775.515]: sw2p0 sys offset       802 s2 freq  +45255 delay 142559
phc2sys[776.517]: sw2p0 sys offset      1368 s2 freq  +46061 delay 140040
phc2sys[777.518]: sw2p0 sys offset     -1897 s2 freq  +43207 delay 141840
phc2sys[778.520]: sw2p0 sys offset      -774 s2 freq  +43761 delay 141680
phc2sys[779.522]: sw2p0 sys offset     -1715 s2 freq  +42587 delay 145199
phc2sys[780.523]: sw2p0 sys offset      4045 s2 freq  +47833 delay 134839
phc2sys[781.525]: sw2p0 sys offset     -4809 s2 freq  +40192 delay 146840
phc2sys[782.526]: sw2p0 sys offset       363 s2 freq  +43922 delay 144759
phc2sys[783.528]: sw2p0 sys offset      3328 s2 freq  +46996 delay 140240
phc2sys[784.530]: sw2p0 sys offset      -293 s2 freq  +44373 delay 142480
phc2sys[785.531]: sw2p0 sys offset        46 s2 freq  +44624 delay 142000
phc2sys[786.533]: sw2p0 sys offset     -3422 s2 freq  +41170 delay 148080
phc2sys[787.534]: sw2p0 sys offset      2932 s2 freq  +46497 delay 140720
phc2sys[788.536]: sw2p0 sys offset     -1961 s2 freq  +42484 delay 147040
phc2sys[789.537]: sw2p0 sys offset      -945 s2 freq  +42912 delay 149160
phc2sys[790.539]: sw2p0 sys offset      3221 s2 freq  +46794 delay 143040
phc2sys[791.541]: sw2p0 sys offset        41 s2 freq  +44580 delay 144160
phc2sys[792.542]: sw2p0 sys offset      -748 s2 freq  +43804 delay 145120

Here, the synchronization offsets in DMA mode are an order of magnitude
worse, so yeah, initial enthusiasm definitely curbed now.

For me, what matters is not the latency itself, but the ability to
cross-timestamp one byte within the SPI transfer with high granularity,
and for the uncertainty of that timestamp to be as small and constant as
possible.

For that reason, I can post a third output log, taken in XSPI FIFO mode
but with "ctlr->ptp_sts_supported = true" removed. That causes the core
to take message-level software timestamps, which are a better indicator
of latency.

You can see that in FIFO mode, the minimum is much smaller (108 us) but
the spread is larger (the maximum is 209 us). In DMA mode, the latencies
are much more stable. But despite this, XSPI is still better for the
ability to zoom in on the particular byte of interest.

$ phc2sys -s CLOCK_REALTIME -c sw2p0 -O 0 -m
phc2sys[246.568]: sw2p0 sys offset   2872475 s0 freq  -88840 delay 131332
phc2sys[247.571]: sw2p0 sys offset   2874267 s1 freq  -87052 delay 194739
phc2sys[248.572]: sw2p0 sys offset     71966 s2 freq  -15086 delay 114971
phc2sys[249.573]: sw2p0 sys offset     34792 s2 freq  -30670 delay 108331
phc2sys[250.575]: sw2p0 sys offset    -39553 s2 freq  -94578 delay 208580
phc2sys[251.577]: sw2p0 sys offset     50369 s2 freq  -16521 delay 107410
phc2sys[252.578]: sw2p0 sys offset      1597 s2 freq  -50183 delay 128292
phc2sys[253.579]: sw2p0 sys offset      6685 s2 freq  -44616 delay 107810
phc2sys[254.581]: sw2p0 sys offset     -4102 s2 freq  -53397 delay 108530
phc2sys[255.582]: sw2p0 sys offset     -7256 s2 freq  -57782 delay 112051
phc2sys[256.584]: sw2p0 sys offset     -2910 s2 freq  -55613 delay 108610
phc2sys[257.586]: sw2p0 sys offset    -52981 s2 freq -106557 delay 209460
phc2sys[258.587]: sw2p0 sys offset     49914 s2 freq  -19556 delay 107130
phc2sys[259.589]: sw2p0 sys offset    -29913 s2 freq  -84409 delay 195699
phc2sys[260.591]: sw2p0 sys offset     42439 s2 freq  -21031 delay 110411
phc2sys[261.592]: sw2p0 sys offset      3048 s2 freq  -47690 delay 120571
phc2sys[262.594]: sw2p0 sys offset      -853 s2 freq  -50676 delay 113291
phc2sys[263.596]: sw2p0 sys offset    -35260 s2 freq  -85339 delay 173937
phc2sys[264.597]: sw2p0 sys offset     26479 s2 freq  -34178 delay 110570
phc2sys[265.599]: sw2p0 sys offset    -36802 s2 freq  -89516 delay 195699
phc2sys[266.601]: sw2p0 sys offset     39945 s2 freq  -23809 delay 110571
phc2sys[267.603]: sw2p0 sys offset    -32036 s2 freq  -83807 delay 194858
phc2sys[268.604]: sw2p0 sys offset     37721 s2 freq  -23661 delay 110570
phc2sys[269.606]: sw2p0 sys offset      5110 s2 freq  -44955 delay 112571
phc2sys[270.607]: sw2p0 sys offset     -3526 s2 freq  -52058 delay 109570
phc2sys[271.608]: sw2p0 sys offset     -7856 s2 freq  -57446 delay 112491
phc2sys[272.610]: sw2p0 sys offset     -5259 s2 freq  -57206 delay 112051
phc2sys[273.612]: sw2p0 sys offset    -43272 s2 freq  -96797 delay 194178
phc2sys[274.613]: sw2p0 sys offset     40708 s2 freq  -25798 delay 108291
phc2sys[275.615]: sw2p0 sys offset    -38753 s2 freq  -93047 delay 208900
phc2sys[276.616]: sw2p0 sys offset     47948 s2 freq  -17972 delay 111050
phc2sys[277.618]: sw2p0 sys offset     10692 s2 freq  -40843 delay 111131
phc2sys[278.619]: sw2p0 sys offset     -2179 s2 freq  -50507 delay 108530
phc2sys[279.620]: sw2p0 sys offset     -8143 s2 freq  -57124 delay 111571
phc2sys[280.623]: sw2p0 sys offset    -49486 s2 freq -100910 delay 199179
phc2sys[281.625]: sw2p0 sys offset     -3684 s2 freq  -69954 delay 199419
phc2sys[282.626]: sw2p0 sys offset     54475 s2 freq  -12900 delay 111651
phc2sys[283.628]: sw2p0 sys offset    -36562 s2 freq  -87595 delay 209420
^Cphc2sys[284.181]: sw2p0 sys offset    -11239 s2 freq  -73240 delay 194499

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ