lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 21 Jun 2018 13:28:09 -0700
From:   Don Bollinger <don@...bollingers.org>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     netdev <netdev@...r.kernel.org>,
        Florian Fainelli <f.fainelli@...il.com>, don@...bollingers.org
Subject: Re: [PATCH] optoe: driver to read/write SFP/QSFP EEPROMs

On Thu, Jun 21, 2018 at 10:11:27AM +0200, Andrew Lunn wrote:
> > I'm trying to figure out how the netdev environment works on large
> > switches.  I can't imagine that the kernel network stack is involved in
> > any part of the data plane. 
> 
> Hi Don
> 
> It is involved in the slow path. I.e. packets from the host out
> network ports. BPDU, IGMP, ARP, ND, etc. It can also be involved when
> the hardware is missing features. Also, for communication with the
> host itself.
> 
> What is more important is it is the control plane. Want to bridge two
> ports together?  You create a software bridge, add the two ports, and
> then offload it to the hardware. The kernel STP code in the software
> bridge then does the state tracking, blocked, learning, forwarding
> etc. Need RSTP? Just run the standard RSTP daemon on the software
> bridge interface.
> 
> Basically, use the Linux network stack as is, and offload what you can
> to the hardware. That means you keep all your existing user space
> network tools, daemons, SNMP agents, etc. They all just work, because
> the kernel APIs remain the same, independent of if you have a switch,
> or just a bunch of networks cards.
> 
> > Can you point me to any conference slides,
> > or design docs or other documentation that describes a netdev
> > implementation on Trident II switch silicon?  Or any other switch that
> > supports 128 x 10G (or 32 x 40G) ports?
> 
> Look at past netdev conference. In particular, talks given by
> Mellanox, Cumulus, and Netronome. You can also see there drivers in

Thanks.  I found a slide deck from Cumulus at
www.slideshare.net/CumulusNetworks/webinarlinux-networking-is-awesome

I think this connects the dots between our worlds.  It turns out that
optoe actually is derived from the Cumulus sff_8436 driver, which they
use to access QSFP devices.  It was the best available, but had an
experimental implementation of SFP (didn't work yet).  They actually use
the at24 driver for SFP.  Optoe is actually architecturally identical to
the Cumulus implementation.  It does not use the SFP framework, but it 
does interface with their Linux network stack via ethtool, etc.  In slide
10 of the deck they explicitly call out device drivers, saying "Innovation
and change here is good."

> drivers/ethernet/{mellonex|netronome}. These devices however tend to
> go for firmware to control the PHYs, not the Linux network stack.
> drivers/net/dsa covers SOHO switches, so up to 10 ports, mostly 1G,
> some 10G ports. There is a lot of industry involved thin this segment,
> trains, planes, busses, plant automation, etc, and some WiFi and STP.
> Switches with DSA drivers make use of Linux to control the PHYs, not
> firmware.

Again, optoe does not control the PHYs.  It only access the EEPROMs (on
the PHYs).  It does not touch any of the electrical pins.  It can provide 
the EEPROM access to any component that wants that access, including sfp.c.

> SFP are also slowly starting to enter the consumer market, with
> products like the Clearfog, MACCHIATObin, and there are a few
> industrial boards with SOHO switches with SFP sockets or SFF
> modules. These are what are driving in kernel SFP support.

Got it.  I'm targeting a different market, with a different
architecture.  In this architecture it makes more sense to separate the
EEPROM access from the IO pins control.

> > Also, I see the sfp.c code.  Is there any QSFP code?  I'd like to see
> > how that code compares to the sfp.c code.
> 
> Currently, none that i know of. SFP+ is the limit so far. Mainly
> because SoC/SOHO switches currently top out at 10G, so SFP+ is
> sufficient.

SFP+ is not sufficient for another market, which is using Linux to
manage larger switches.  These switches all have some QSFP ports, many
of them have exclusively QSFP ports.  I have some useful code for those
environments.

> > optoe can provide access, through the SFP code, to the rest of the EEPROM
> > address space.  It can also provide access to the QSFP EEPROM.  I would
> > like to collaborate on the integration, that would fit my objective of
> > making more EEPROM space accessible on more platforms and distros.
> > 
> > However, you don't want me to make the changes to SFP myself.  I don't
> > have any hardware or OS environment that currently supports that code.
> > The cost and learning curve exceed my resources.  I *think* the changes
> > to the SFP code would be small, but I would need someone who understands
> > the code and can test it to actually make and submit the changes.
> 
> So i have been thinking about this some more. But given your lack of
> resources, i'm guessing this won't actually work for you. But it is
> probably the correct way forwards.
> 
> The basic problem the systems you are talking about is that they don't
> have a network interface per port. So they cannot use ethtool
> --module-info, which IS the defined API to get access to SFP
> data. Adding another API is probably not going to get accepted.

Got it.  I don't think I'm adding another API.  Note that Cumulus is
using the same architecture as optoe, and providing all the expected
linux services, including ethtool --module-info.  They are accessing the
module-info data throug ioctl, which opens the device file provided by
their driver and reads/writes the appropriate location.  Optoe works the
same way.

> However, the current ethtool API is ioctl based. The network stack is
> moving away from that, replacing it with netlink sockets. All IP
> configuration is now via netlink. All bridge configuration is by
> netlink, etc. So there is a desire to move ethtool to netlink.
> 
> This move makes the API more flexible. By default, you would expect
> the replacement implementation for --module-info to pass an ifindex,
> or maybe an interface name. However, it could be made to optionally
> take an i2c bus number. That could then go direct to the SFP code,
> rather than go via the MAC driver. That would give evil user space,
> proprietary, binary blob drivers access to SFP via the standard
> supported kernel API, using the standard supported kernel SFP driver.

Here's what I have in mind...  struct sfp in sfp.c has a read and write:

   int (*read) struct sfp *, bool, u8, void *, size_t);
   int (*write) struct sfp *, bool, u8, void *, size_t);

These are instantiated with:

   sfp->read = sfp_i2c_read;
   sfp->write = sfp_i2c_write;

So to insert optoe into this stack, we would need to add an i2c_client
to struct sfp:

   struct i2c_client *i2c_client;

We would need to initialize that i2c_client in sfp_i2c_configure:

   board_info = alloc_board_info(sfp);
   sfp->i2c_client = i2c_new_device(i2c, board_info);

We need to write the brief routine that allocates a struct_i2c_board_info,
and stuffs the necessary data into it.  I'm assuming that data can come
from sfp.  The data required includes an appropriate name for this device,
and whether it is an SFP or QSFP device.  (When QSFP is added to sfp.c,
we can add a flag to struct sfp.  Your stack will have to know which it
is anyway to know where the necessary data is in the EEPROM.)

Finally, we replace the body of sfp_i2c_{read, write} with a callback to
optoe.  All of the necessary data is already in the parameters to
sfp_i2c_{read, write}.

> 
> But that requires you roll up your sleeves and get stuck in to this
> conversion work.

I'm offering an improvement to sfp.c.  The improvement is access to more
pages of SFP EEPROM, and access to QSFP EEPROMs.  It comes in the form of
a specialized EEPROM driver custom built for {Q}SFP devices.  I'm also
offering to help integrate that driver into sfp.c.  I can modify optoe
to accomodate sfp.c, I can recommend how to instantiate and call it. I am
not going to be able to spend the time and money required to modify and
test sfp.c.  I'm pretty sure you can do it MUCH faster, and MUCH better
than I can.

> 
> But you say you work for a fibre module company. Do they produce
> SFP/SFP+ modules? You could get one of the supported consumer boards
> with an SFP/SFP+ socket and test your modules work properly. Build out

Unfortunately, that isn't going to happen on their dime.  Their dimes
are running out for this kind of work.

> the SFP code. It has been on my TODO list to add HWMON support for the
> temperature sensors, etc.

Huh.  Just read Documentation/hwmon/sysfs-interface.  Looks like a good
way to deliver that EEPROM data.  Wish I'd found that two years ago when
there were a few more dimes available.

Don

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ