netdev - Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE interfaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYlQfm3eW/jRS4Ra@shredder>
Date:   Mon, 8 Nov 2021 18:29:50 +0200
From:   Ido Schimmel <idosch@...sch.org>
To:     "Machnikowski, Maciej" <maciej.machnikowski@...el.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
        "richardcochran@...il.com" <richardcochran@...il.com>,
        "abyagowi@...com" <abyagowi@...com>,
        "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
        "mkubecek@...e.cz" <mkubecek@...e.cz>,
        "saeed@...nel.org" <saeed@...nel.org>,
        "michael.chan@...adcom.com" <michael.chan@...adcom.com>
Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
 interfaces

On Mon, Nov 08, 2021 at 08:35:17AM +0000, Machnikowski, Maciej wrote:
> > -----Original Message-----
> > From: Ido Schimmel <idosch@...sch.org>
> > Sent: Sunday, November 7, 2021 3:09 PM
> > To: Machnikowski, Maciej <maciej.machnikowski@...el.com>
> > Subject: Re: [PATCH v2 net-next 6/6] docs: net: Add description of SyncE
> > interfaces
> > 
> > On Fri, Nov 05, 2021 at 09:53:31PM +0100, Maciej Machnikowski wrote:
> > > +Interface
> > > +=========
> > > +
> > > +The following RTNL messages are used to read/configure SyncE recovered
> > > +clocks.
> > > +
> > > +RTM_GETRCLKRANGE
> > > +-----------------
> > > +Reads the allowed pin index range for the recovered clock outputs.
> > > +This can be aligned to PHY outputs or to EEC inputs, whichever is
> > > +better for a given application.
> > 
> > Can you explain the difference between PHY outputs and EEC inputs? It is
> > no clear to me from the diagram.
> 
> PHY is the source of frequency for the EEC, so PHY produces the reference
> And EEC synchronizes to it.
> 
> Both PHY outputs and EEC inputs are configurable. PHY outputs usually are
> configured using PHY registers, and EEC inputs in the DPLL references
> block
>  
> > How would the diagram look in a multi-port adapter where you have a
> > single EEC?
> 
> That depends. It can be either a multiport PHY - in this case it will look
> exactly like the one I drawn. In case we have multiple PHYs their recovered
> clock outputs will go to different recovered clock inputs and each PHY
> TX clock inputs will be driven from different EEC's synchronized outputs
> or from a single one through  clock fan out.
> 
> > > +Will call the ndo_get_rclk_range function to read the allowed range
> > > +of output pin indexes.
> > > +Will call ndo_get_rclk_range to determine the allowed recovered clock
> > > +range and return them in the IFLA_RCLK_RANGE_MIN_PIN and the
> > > +IFLA_RCLK_RANGE_MAX_PIN attributes
> > 
> > The first sentence seems to be redundant
> > 
> > > +
> > > +RTM_GETRCLKSTATE
> > > +-----------------
> > > +Read the state of recovered pins that output recovered clock from
> > > +a given port. The message will contain the number of assigned clocks
> > > +(IFLA_RCLK_STATE_COUNT) and an N pin indexes in
> > IFLA_RCLK_STATE_OUT_IDX
> > > +To support multiple recovered clock outputs from the same port, this
> > message
> > > +will return the IFLA_RCLK_STATE_COUNT attribute containing the number
> > of
> > > +active recovered clock outputs (N) and N IFLA_RCLK_STATE_OUT_IDX
> > attributes
> > > +listing the active output indexes.
> > > +This message will call the ndo_get_rclk_range to determine the allowed
> > > +recovered clock indexes and then will loop through them, calling
> > > +the ndo_get_rclk_state for each of them.
> > 
> > Why do you need both RTM_GETRCLKRANGE and RTM_GETRCLKSTATE? Isn't
> > RTM_GETRCLKSTATE enough? Instead of skipping over "disabled" pins in the
> > range IFLA_RCLK_RANGE_MIN_PIN..IFLA_RCLK_RANGE_MAX_PIN, just
> > report the
> > state (enabled / disable) for all
> 
> Great idea! Will implement it.
>  
> > > +
> > > +RTM_SETRCLKSTATE
> > > +-----------------
> > > +Sets the redirection of the recovered clock for a given pin. This message
> > > +expects one attribute:
> > > +struct if_set_rclk_msg {
> > > +	__u32 ifindex; /* interface index */
> > > +	__u32 out_idx; /* output index (from a valid range)
> > > +	__u32 flags; /* configuration flags */
> > > +};
> > > +
> > > +Supported flags are:
> > > +SET_RCLK_FLAGS_ENA - if set in flags - the given output will be enabled,
> > > +		     if clear - the output will be disabled.
> > 
> > In the diagram you have two recovered clock outputs going into the EEC.
> > According to which the EEC is synchronized?
> 
> That will depend on the future DPLL configuration. For now it'll be based
> on the DPLL's auto select ability and its default configuration.
>  
> > How does user space know which pins to enable?
> 
> That's why the RTM_GETRCLKRANGE was invented but I like the suggestion
> you made above so will rework the code to remove the range one and
> just return the indexes with enable/disable bit for each of them. In this
> case youserspace will just send the RTM_GETRCLKSTATE to learn what
> can be enabled.

In the diagram there are multiple Rx lanes, all of which might be used
by the same port. How does user space know to differentiate between the
quality levels of the clock signal recovered from each lane / pin when
the information is transmitted on a per-port basis via ESMC messages?

The uAPI seems to be too low-level and is not compatible with Nvidia's
devices and potentially other vendors. We really just need a logical
interface that says "Synchronize the frequency of the EEC to the clock
recovered from port X". The kernel / drivers should abstract the inner
workings of the device from user space. Any reason this can't work for
ice?

I also want to re-iterate my dissatisfaction with the interface being
netdev-centric. By modelling the EEC as a standalone object we will be
able to extend it to set the source of the EEC to something other than a
netdev in the future. If we don't do it now, we will end up with two
ways to report the source of the EEC (i.e., EEC_SRC_PORT and something
else).

Other advantages of modelling the EEC as a separate object include the
ability for user space to determine the mapping between netdevs and EECs
(currently impossible) and reporting additional EEC attributes such as
SyncE clockIdentity and default SSM code. There is really no reason to
report all of this identical information via multiple netdevs.

With regards to rtnetlink vs. something else, in my suggestion the only
thing that should be reported per-netdev is the mapping between the
netdev and the EEC. Similar to the way user space determines the mapping
from netdev to PHC via ETHTOOL_GET_TS_INFO. If we go with rtnetlink,
this can be reported as a new attribute in RTM_NEWLINK, no need to add
new messages.