[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5461354A.3020906@gmail.com>
Date: Mon, 10 Nov 2014 13:59:38 -0800
From: John Fastabend <john.fastabend@...il.com>
To: Jiri Pirko <jiri@...nulli.us>
CC: netdev@...r.kernel.org, davem@...emloft.net, nhorman@...driver.com,
andy@...yhouse.net, tgraf@...g.ch, dborkman@...hat.com,
ogerlitz@...lanox.com, jesse@...ira.com, pshelar@...ira.com,
azhou@...ira.com, ben@...adent.org.uk, stephen@...workplumber.org,
jeffrey.t.kirsher@...el.com, vyasevic@...hat.com,
xiyou.wangcong@...il.com, john.r.fastabend@...el.com,
edumazet@...gle.com, jhs@...atatu.com, sfeldma@...il.com,
f.fainelli@...il.com, roopa@...ulusnetworks.com,
linville@...driver.com, jasowang@...hat.com, ebiederm@...ssion.com,
nicolas.dichtel@...nd.com, ryazanov.s.a@...il.com,
buytenh@...tstofly.org, aviadr@...lanox.com, nbd@...nwrt.org,
alexei.starovoitov@...il.com, Neil.Jerram@...aswitch.com,
ronye@...lanox.com, simon.horman@...ronome.com,
alexander.h.duyck@...hat.com, john.ronciak@...el.com,
mleitner@...hat.com, shrijeet@...il.com, gospo@...ulusnetworks.com,
bcrl@...ck.org
Subject: Re: [patch net-next v2 02/10] net: introduce generic switch devices
support
On 11/09/2014 02:51 AM, Jiri Pirko wrote:
> The goal of this is to provide a possibility to support various switch
> chips. Drivers should implement relevant ndos to do so. Now there is
> only one ndo defined:
> - for getting physical switch id is in place.
>
> Note that user can use random port netdevice to access the switch.
>
> Signed-off-by: Jiri Pirko <jiri@...nulli.us>
> ---
> Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
> MAINTAINERS | 7 ++++
> include/linux/netdevice.h | 10 ++++++
> include/net/switchdev.h | 30 +++++++++++++++++
> net/Kconfig | 1 +
> net/Makefile | 3 ++
> net/switchdev/Kconfig | 13 ++++++++
> net/switchdev/Makefile | 5 +++
> net/switchdev/switchdev.c | 33 +++++++++++++++++++
> 9 files changed, 161 insertions(+)
> create mode 100644 Documentation/networking/switchdev.txt
> create mode 100644 include/net/switchdev.h
> create mode 100644 net/switchdev/Kconfig
> create mode 100644 net/switchdev/Makefile
> create mode 100644 net/switchdev/switchdev.c
>
> diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
> new file mode 100644
> index 0000000..98be76c
> --- /dev/null
> +++ b/Documentation/networking/switchdev.txt
> @@ -0,0 +1,59 @@
> +Switch (and switch-ish) device drivers HOWTO
> +===========================
> +
> +Please note that the word "switch" is here used in very generic meaning.
> +This include devices supporting L2/L3 but also various flow offloading chips,
> +including switches embedded into SR-IOV NICs.
> +
> +Lets describe a topology a bit. Imagine the following example:
> +
> + +----------------------------+ +---------------+
> + | SOME switch chip | | CPU |
> + +----------------------------+ +---------------+
> + port1 port2 port3 port4 MNGMNT | PCI-E |
> + | | | | | +---------------+
> + PHY PHY | | | | NIC0 NIC1
> + | | | | | |
> + | | +- PCI-E -+ | |
> + | +------- MII -------+ |
> + +------------- MII ------------+
> +
> +In this example, there are two independent lines between the switch silicon
> +and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
> +separate from the switch driver. SOME switch chip is by managed by a driver
> +via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
> +connected to some other type of bus.
> +
> +Now, for the previous example show the representation in kernel:
> +
> + +----------------------------+ +---------------+
> + | SOME switch chip | | CPU |
> + +----------------------------+ +---------------+
> + sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E |
> + | | | | | +---------------+
> + PHY PHY | | | | eth0 eth1
> + | | | | | |
> + | | +- PCI-E -+ | |
> + | +------- MII -------+ |
> + +------------- MII ------------+
> +
> +Lets call the example switch driver for SOME switch chip "SOMEswitch". This
> +driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
> +created for each port of a switch. These netdevices are instances
> +of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
> +of the switch chip. eth0 and eth1 are instances of some other existing driver.
> +
> +The only difference of the switch-port netdevice from the ordinary netdevice
> +is that is implements couple more NDOs:
> +
> + ndo_sw_parent_get_id - This returns the same ID for two port netdevices
> + of the same physical switch chip. This is
> + mandatory to be implemented by all switch drivers
> + and serves the caller for recognition of a port
> + netdevice.
What is the connection between ndo_sw_parent_get_id and
ndo_get_phys_port_id(). I'm having a bit of trouble teasing
this out.
For example here is my ascii art for a SR-IOV NIC,
eth0 eth1 eth2
| | |
| | |
PF VF VF
+----+---------+--------+----+
| embedded bridge |
+-------------+--------------+
|
port
that can do switching between the various uplinks and downlinks.
In IEEE 802.1Q language the embedded bridge acts like an edge
relay. At least that seems to be the current state of the art
for SR-IOV. Edge relay just means it has a single uplink port
to the network and multiple downlinks and also isn't required
to do learning and run loop detection protocols STP, et. al.
Also there are multi-function devices that look the same except
replace the VFs with PFs. It seems to be a common mode for NICs
that do the iSCSI offloads with storage functions.
When something is an embedded bridge vs a SOME switch chip is
not entirely clear.
My understanding is use ndo_sw_parent_get_id() when you have
multiple physical ports all connected to a single switch object.
When you have a single port connected to multiple PCIE functions
or queues representing a netdev (e.g. macvlan offload) use the
ndo_get_phys_port_id(). Just want to be sure we are on the
same page here.
Otherwise patch looks good. I think we can clear the above up
with an addition to the documentation. Could go in after the
initial set and be OK with me.
IMO this patch is needed otherwise user space is at a complete
loss on trying to figure out how netdevs map to switch silicon.
You could have reused ndo_get_phys_port_id() perhaps but then
I think user space may get confused by SR-IOV/VMDQ/etc ports
attached to a switch silicon. For .02$ having a new distinct
identifier is cleaner.
> + ndo_sw_parent_* - Functions that serve for a manipulation of the switch
> + chip itself (it can be though of as a "parent" of the
> + port, therefore the name). They are not port-specific.
> + Caller might use arbitrary port netdevice of the same
> + switch and it will make no difference.
> + ndo_sw_port_* - Functions that serve for a port-specific manipulation.
[...]
Thanks,
John
--
John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists