lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200709115958.GB776@hoboy>
Date:   Thu, 9 Jul 2020 04:59:59 -0700
From:   Richard Cochran <richardcochran@...il.com>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, sorganov@...il.com,
        andrew@...n.ch
Subject: Re: [PATCH] docs: networking: timestamping: add section for stacked
 PHC devices

On Wed, Jul 08, 2020 at 11:56:21PM +0300, Vladimir Oltean wrote:
> The concept of timestamping DSA switches / Ethernet PHYs is becoming
> more and more popular, however the Linux kernel timestamping code has
> evolved quite organically and there's layers upon layers of new and old
> code that need to work together for things to behave as expected.
> 
> Add this chapter to explain what the overall goals are.

This is great.  I have one correction and a few formatting and style
suggestions, below.  With the correction in place,

Reviewed-by: Richard Cochran <richardcochran@...il.com>

> Loosely based upon this email discussion plus some more info:
> https://lkml.org/lkml/2020/7/6/481
> 
> Signed-off-by: Vladimir Oltean <olteanv@...il.com>
> ---
>  Documentation/networking/timestamping.rst | 149 ++++++++++++++++++++++
>  1 file changed, 149 insertions(+)
> 
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index 1adead6a4527..14df58c24e8c 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -589,3 +589,152 @@ Time stamps for outgoing packets are to be generated as follows:
>    this would occur at a later time in the processing pipeline than other
>    software time stamping and therefore could lead to unexpected deltas
>    between time stamps.
> +
> +3.2 Special considerations for stacked PTP Hardware Clocks
> +----------------------------------------------------------
> +
> +There are situations when there may be more than one PHC (PTP Hardware Clock)
> +in the data path of a packet. The kernel has no explicit mechanism to allow the
> +user to select which PHC to use for timestamping Ethernet frames. Instead, the
> +assumption is that the outermost PHC is always the most preferable, and that
> +kernel drivers collaborate towards achieving that goal. Currently there are 3
> +cases of stacked PHCs, detailed below:
> +
> +- DSA (Distributed Switch Architecture) switches.

For readability, please start a new paragraph here.

     These are Ethernet switches
> +  which have one of their ports connected to an (otherwise completely unaware)
> +  host Ethernet interface, and perform the role of a port multiplier with
> +  optional forwarding acceleration features.  Each DSA switch port is visible
> +  to the user as a standalone (virtual) network interface, however network I/O
> +  is performed under the hood indirectly through the host interface.

new paragraph.

> +  When a DSA switch is attached to a host port, PTP synchronization has to
> +  suffer, since the switch's variable queuing delay introduces a path delay
> +  jitter between the host port and its PTP partner. For this reason, some DSA
> +  switches include a timestamping clock of their own, and have the ability to
> +  perform network timestamping on their own MAC, such that path delays only
> +  measure wire and PHY propagation latencies. Timestamping DSA switches are
> +  supported in Linux and expose the same ABI as any other network interface
> +  (save for the fact that the DSA interfaces are in fact virtual in terms of
> +  network I/O, they do have their own PHC).  It is typical, but not mandatory,
> +  for all interfaces of a DSA switch to share the same PHC.

new paragraph.

     By design, PTP
> +  timestamping with a DSA switch does not need any special handling in the
> +  driver for the host port it is attached to.  However, when the host port also
> +  supports PTP timestamping, DSA will take care of intercepting the
> +  ``.ndo_do_ioctl`` calls towards the host port, and block attempts to enable
> +  hardware timestamping on it. This is because the SO_TIMESTAMPING API does not
> +  allow the delivery of multiple hardware timestamps for the same packet, so
> +  anybody else except for the DSA switch port must be prevented from doing so.

new paragraph.

> +  In code, DSA provides for most of the infrastructure for timestamping
> +  already, in generic code: a BPF classifier (``ptp_classify_raw``) is used to
> +  identify PTP event messages (any other packets, including PTP general
> +  messages, are not timestamped), and provides two hooks to drivers:
> +
> +    - ``.port_txtstamp()``: A clone of the timestampable skb to be transmitted,
> +      before actually transmitting it. Typically, a switch will have a PTP TX

"A clone of the ..."  is not a sentence.

> +      timestamp register (or sometimes a FIFO) where the timestamp becomes
> +      available. There may be an IRQ that is raised upon this timestamp's
> +      availability, or the driver might have to poll after invoking
> +      ``dev_queue_xmit()`` towards the host interface. Either way, in the
> +      ``.port_txtstamp()`` method, the driver only needs to save the clone for
> +      later use (when the timestamp becomes available). Each skb is annotated
> +      with a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the
> +      driver's job of keeping track of which clone belongs to which skb.
> +
> +   - ``.port_rxtstamp()``: The original (and only) timestampable skb is
> +     provided to the driver, for it to annotate it with a timestamp, if that is
> +     immediately available, or defer to later. On reception, timestamps might
> +     either be available in-band (through metadata in the DSA header, or
> +     attached in other ways to the packet), or out-of-band (through another RX
> +     timestamping FIFO). Deferral on RX is typically necessary when retrieving
> +     the timestamp needs a sleepable context. In that case, it is the
> +     responsibility of the driver to call ``netif_rx_ni()`` on the freshly

responsibility of the *DSA* driver

> +     timestamped skb.
> +
> +
> +- Ethernet PHYs.

new paragraph.

     These are devices that typically fulfill a Layer 1 role in the
> +  network stack, hence they do not have a representation in terms of a network
> +  interface as DSA switches do. However, PHYs may be able to detect and
> +  timestamp PTP packets, for performance reasons: timestamps taken as close as
> +  possible to the wire have the potential to yield a more stable and precise
> +  synchronization.
> +  A PHY driver that supports PTP timestamping must create a ``struct
> +  mii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence
> +  of this pointer will be checked by the networking stack.

new paragraph.

> +  Since PHYs do not have network interface representations, the timestamping
> +  and ethtool ioctl operations for them need to be mediated by their respective
> +  MAC driver. Therefore, as opposed to DSA switches, modifications need to be
> +  done to each individual MAC driver for PHY timestamping support. This
> +  entails:
> +
> +    - Checking, in ``.ndo_do_ioctl``, whether
> +      ``phy_has_hwtstamp(netdev->phydev)`` is true or not. If it is, then the MAC
> +      driver should not process this request but instead pass it on to the PHY
> +      using ``phy_mii_ioctl()``.
> +
> +    - Checking, before delivering received skb's up the network stack (using
> +      ``napi_gro_receive`` or similar), whether ``skb_defer_rx_timestamp(skb)``
> +      is necessary or not - and if it is, don't call ``napi_gro_receive`` at all.

Actually, only drivers that use the traditional netif_rx() will need
to call skb_defer_rx_timestamp().

In the specific case of napi_gro_receive, we have:

napi_gro_receive
 napi_skb_finish
  gro_normal_one
   gro_normal_list
    netif_receive_skb_list_internal
     if (!skb_defer_rx_timestamp(skb))
      ...

The stack automatically also checks skb_defer_rx_timestamp() in other
cases, for example:

netif_receive_skb
 ret = netif_receive_skb_internal(skb);
  if (skb_defer_rx_timestamp(skb))
   return NET_RX_SUCCESS;

> +      If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is enabled, and
> +      ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook will be
> +      called now, to determine, using logic very similar to DSA, whether deferral
> +      for RX timestamping is necessary.  Again like DSA, it becomes the
> +      responsibility of the PHY driver to send the packet up the stack when the
> +      timestamp is available.
> +
> +    - On TX, special intervention from the MAC driver might or might not be
> +      needed. The function that calls the ``mii_ts->txtstamp()`` hook is named
> +      ``skb_clone_tx_timestamp()``. This function can either be called directly
> +      (case in which explicit MAC driver support is indeed needed), but the
> +      function also piggybacks from the ``skb_tx_timestamp()`` call, which many
> +      MAC drivers already perform for software timestamping purposes. Therefore,
> +      if a MAC supports software timestamping, it does not need to do anything
> +      further at this stage.
> +
> +
> +- MII bus snooping devices.

new paragraph.

     These perform the same role as timestamping
> +  Ethernet PHYs, save for the fact that they are discrete devices and can
> +  therefore be used in conjunction with any PHY even if it doesn't support
> +  timestamping. In Linux, they are discoverable and attachable to a ``struct
> +  phy_device`` through Device Tree, and for the rest, they use the same mii_ts
> +  infrastructure as those. See
> +  Documentation/devicetree/bindings/ptp/timestamper.txt for more details.
> +
> +One caveat with stacked PHCs, especially with DSA (but not only) - since that
> +doesn't require any modification to MAC drivers, so it is more difficult to
> +ensure correctness of all possible code paths - is that they uncover bugs which
> +were impossible to trigger before the existence of stacked PTP clocks.
> +One example has to do with this line of code, already presented earlier::
> +
> +      skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
> +
> +Any TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY
> +driver or a MII bus snooping device driver, should set this flag.
> +But a MAC driver that is unaware of PHC stacking might get tripped up by
> +somebody other than itself setting this flag, and deliver a duplicate
> +timestamp.
> +For example, a typical driver design for TX timestamping might be to split the
> +transmission part into 2 portions:
> +
> +1. "TX": checks whether PTP timestamping has been previously enabled through
> +   the ``.ndo_do_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the
> +   current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags &
> +   SKBTX_HW_TSTAMP``"). If this is true, it sets the
> +   "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as
> +   described above, in the case of a stacked PHC system, this condition should
> +   never trigger, as this MAC is certainly not the outermost PHC. But this is
> +   not where the typical issue is.  Transmission proceeds with this packet.
> +
> +2. "TX confirmation": Transmission has finished. The driver checks whether it
> +   is necessary to collect any TX timestamp for it. Here is where the typical
> +   issues are: the MAC driver takes a shortcut and only checks whether
> +   "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked
> +   PHC system, this is incorrect because this MAC driver is not the only entity
> +   in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first
> +   place.
> +
> +The correct solution for this problem is for MAC drivers to have a compound
> +check in their "TX confirmation" portion, not only for
> +"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for
> +"``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures
> +that PTP timestamping is not enabled for anything other than the outermost PHC,
> +this enhanced check will avoid delivering a duplicated TX timestamp to user
> +space.
> -- 
> 2.25.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ