[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 11 Nov 2014 16:11:26 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: John Fastabend <john.fastabend@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, nhorman@...driver.com,
andy@...yhouse.net, tgraf@...g.ch, dborkman@...hat.com,
ogerlitz@...lanox.com, jesse@...ira.com, pshelar@...ira.com,
azhou@...ira.com, ben@...adent.org.uk, stephen@...workplumber.org,
jeffrey.t.kirsher@...el.com, vyasevic@...hat.com,
xiyou.wangcong@...il.com, john.r.fastabend@...el.com,
edumazet@...gle.com, jhs@...atatu.com, sfeldma@...il.com,
f.fainelli@...il.com, roopa@...ulusnetworks.com,
linville@...driver.com, jasowang@...hat.com, ebiederm@...ssion.com,
nicolas.dichtel@...nd.com, ryazanov.s.a@...il.com,
buytenh@...tstofly.org, aviadr@...lanox.com, nbd@...nwrt.org,
alexei.starovoitov@...il.com, Neil.Jerram@...aswitch.com,
ronye@...lanox.com, simon.horman@...ronome.com,
alexander.h.duyck@...hat.com, john.ronciak@...el.com,
mleitner@...hat.com, shrijeet@...il.com, gospo@...ulusnetworks.com,
bcrl@...ck.org
Subject: Re: [patch net-next v2 02/10] net: introduce generic switch devices
support
Mon, Nov 10, 2014 at 10:59:38PM CET, john.fastabend@...il.com wrote:
>On 11/09/2014 02:51 AM, Jiri Pirko wrote:
>>The goal of this is to provide a possibility to support various switch
>>chips. Drivers should implement relevant ndos to do so. Now there is
>>only one ndo defined:
>>- for getting physical switch id is in place.
>>
>>Note that user can use random port netdevice to access the switch.
>>
>>Signed-off-by: Jiri Pirko <jiri@...nulli.us>
>>---
>> Documentation/networking/switchdev.txt | 59 ++++++++++++++++++++++++++++++++++
>> MAINTAINERS | 7 ++++
>> include/linux/netdevice.h | 10 ++++++
>> include/net/switchdev.h | 30 +++++++++++++++++
>> net/Kconfig | 1 +
>> net/Makefile | 3 ++
>> net/switchdev/Kconfig | 13 ++++++++
>> net/switchdev/Makefile | 5 +++
>> net/switchdev/switchdev.c | 33 +++++++++++++++++++
>> 9 files changed, 161 insertions(+)
>> create mode 100644 Documentation/networking/switchdev.txt
>> create mode 100644 include/net/switchdev.h
>> create mode 100644 net/switchdev/Kconfig
>> create mode 100644 net/switchdev/Makefile
>> create mode 100644 net/switchdev/switchdev.c
>>
>>diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt
>>new file mode 100644
>>index 0000000..98be76c
>>--- /dev/null
>>+++ b/Documentation/networking/switchdev.txt
>>@@ -0,0 +1,59 @@
>>+Switch (and switch-ish) device drivers HOWTO
>>+===========================
>>+
>>+Please note that the word "switch" is here used in very generic meaning.
>>+This include devices supporting L2/L3 but also various flow offloading chips,
>>+including switches embedded into SR-IOV NICs.
>>+
>>+Lets describe a topology a bit. Imagine the following example:
>>+
>>+ +----------------------------+ +---------------+
>>+ | SOME switch chip | | CPU |
>>+ +----------------------------+ +---------------+
>>+ port1 port2 port3 port4 MNGMNT | PCI-E |
>>+ | | | | | +---------------+
>>+ PHY PHY | | | | NIC0 NIC1
>>+ | | | | | |
>>+ | | +- PCI-E -+ | |
>>+ | +------- MII -------+ |
>>+ +------------- MII ------------+
>>+
>>+In this example, there are two independent lines between the switch silicon
>>+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
>>+separate from the switch driver. SOME switch chip is by managed by a driver
>>+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
>>+connected to some other type of bus.
>>+
>>+Now, for the previous example show the representation in kernel:
>>+
>>+ +----------------------------+ +---------------+
>>+ | SOME switch chip | | CPU |
>>+ +----------------------------+ +---------------+
>>+ sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT | PCI-E |
>>+ | | | | | +---------------+
>>+ PHY PHY | | | | eth0 eth1
>>+ | | | | | |
>>+ | | +- PCI-E -+ | |
>>+ | +------- MII -------+ |
>>+ +------------- MII ------------+
>>+
>>+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
>>+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
>>+created for each port of a switch. These netdevices are instances
>>+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
>>+of the switch chip. eth0 and eth1 are instances of some other existing driver.
>>+
>>+The only difference of the switch-port netdevice from the ordinary netdevice
>>+is that is implements couple more NDOs:
>>+
>>+ ndo_sw_parent_get_id - This returns the same ID for two port netdevices
>>+ of the same physical switch chip. This is
>>+ mandatory to be implemented by all switch drivers
>>+ and serves the caller for recognition of a port
>>+ netdevice.
>
>What is the connection between ndo_sw_parent_get_id and
>ndo_get_phys_port_id(). I'm having a bit of trouble teasing
>this out.
>
>For example here is my ascii art for a SR-IOV NIC,
>
> eth0 eth1 eth2
> | | |
> | | |
> PF VF VF
> +----+---------+--------+----+
> | embedded bridge |
> +-------------+--------------+
> |
> port
>
>that can do switching between the various uplinks and downlinks.
>In IEEE 802.1Q language the embedded bridge acts like an edge
>relay. At least that seems to be the current state of the art
>for SR-IOV. Edge relay just means it has a single uplink port
>to the network and multiple downlinks and also isn't required
>to do learning and run loop detection protocols STP, et. al.
>
>Also there are multi-function devices that look the same except
>replace the VFs with PFs. It seems to be a common mode for NICs
>that do the iSCSI offloads with storage functions.
>
>When something is an embedded bridge vs a SOME switch chip is
>not entirely clear.
>
>My understanding is use ndo_sw_parent_get_id() when you have
>multiple physical ports all connected to a single switch object.
>When you have a single port connected to multiple PCIE functions
>or queues representing a netdev (e.g. macvlan offload) use the
>ndo_get_phys_port_id(). Just want to be sure we are on the
>same page here.
Nod. You described that right.
>
>Otherwise patch looks good. I think we can clear the above up
>with an addition to the documentation. Could go in after the
>initial set and be OK with me.
>
>IMO this patch is needed otherwise user space is at a complete
>loss on trying to figure out how netdevs map to switch silicon.
>You could have reused ndo_get_phys_port_id() perhaps but then
>I think user space may get confused by SR-IOV/VMDQ/etc ports
>attached to a switch silicon. For .02$ having a new distinct
>identifier is cleaner.
It most definitelly is. Therefore I went that way.
>
>
>>+ ndo_sw_parent_* - Functions that serve for a manipulation of the switch
>>+ chip itself (it can be though of as a "parent" of the
>>+ port, therefore the name). They are not port-specific.
>>+ Caller might use arbitrary port netdevice of the same
>>+ switch and it will make no difference.
>>+ ndo_sw_port_* - Functions that serve for a port-specific manipulation.
>
>[...]
>
>Thanks,
>John
>
>
>--
>John Fastabend Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists