[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140328062827.GB2805@minipsycho.orion>
Date: Fri, 28 Mar 2014 07:28:27 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Florian Fainelli <f.fainelli@...il.com>
Cc: Sergey Ryazanov <ryazanov.s.a@...il.com>,
Jamal Hadi Salim <jhs@...atatu.com>,
Roopa Prabhu <roopa@...ulusnetworks.com>,
Neil Horman <nhorman@...driver.com>,
Thomas Graf <tgraf@...g.ch>, netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Andy Gospodarek <andy@...yhouse.net>,
dborkman <dborkman@...hat.com>, ogerlitz <ogerlitz@...lanox.com>,
jesse <jesse@...ira.com>, pshelar <pshelar@...ira.com>,
azhou <azhou@...ira.com>, Ben Hutchings <ben@...adent.org.uk>,
Stephen Hemminger <stephen@...workplumber.org>,
jeffrey.t.kirsher@...el.com, vyasevic <vyasevic@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>,
John Fastabend <john.r.fastabend@...el.com>,
Eric Dumazet <edumazet@...gle.com>,
Scott Feldman <sfeldma@...ulusnetworks.com>,
Lennert Buytenhek <buytenh@...tstofly.org>,
Shrijeet Mukherjee <shm@...ulusnetworks.com>,
Felix Fietkau <nbd@...nwrt.org>
Subject: Re: [patch net-next RFC 0/4] introduce infrastructure for support of
switch chip datapath
Thu, Mar 27, 2014 at 10:20:06PM CET, f.fainelli@...il.com wrote:
>2014-03-27 13:32 GMT-07:00 Sergey Ryazanov <ryazanov.s.a@...il.com>:
>> 2014-03-27 20:41 GMT+04:00 Florian Fainelli <f.fainelli@...il.com>:
>>> 2014-03-27 7:10 GMT-07:00 Sergey Ryazanov <ryazanov.s.a@...il.com>:
>>>> Hi all,
>>>>
>>>> sorry for the intrusion, but let me place my 2 cents.
>>>>
>>>> 2014-03-27 10:56 GMT+04:00 Jiri Pirko <jiri@...nulli.us>:
>>>>> Wed, Mar 26, 2014 at 11:22:51PM CET, f.fainelli@...il.com wrote:
>>>>>>2014-03-26 14:51 GMT-07:00 Jamal Hadi Salim <jhs@...atatu.com>:
>>>>>>> On 03/26/14 14:14, Jiri Pirko wrote:
>>>>>>>>
>>>>>>>> Wed, Mar 26, 2014 at 06:58:32PM CET, f.fainelli@...il.com wrote:
>>>>>>>>>
>>>>>>>>> 2014-03-26 10:35 GMT-07:00 Jiri Pirko <jiri@...nulli.us>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> You are right, sw1p0 and sw1p1 were meant to be, say LAN ports in my
>>>>>>>>> example.
>>>>>>>>>
>>>>>>>>> I think there is an implicit convention that sw1 represents the
>>>>>>>>> Ethernet switch port connected to the CPU Ethernet MAC, and that it is
>>>>>>>>> always connected, hence there is no need to create a "fake" bridge to
>>>>>>>>> link sw1 to eth0 for instance?
>>>>>>>>
>>>>>>>>
>>>>>>>> I think you are kind of mixing apples and oranges (or I might be I'm not
>>>>>>>> understanding you correctly).
>>>>>>>> This is how I see it, sticking to the names you use in the example:
>>>>>>>>
>>>>>>>> (sw1) (abstract place-holder netdev)
>>>>>>>> --------
>>>>>>>> switch chip CPU
>>>>>>>> ----------------------- ------
>>>>>>>> sw1p0 sw1p1 sw1p2 sw1p3 eth0
>>>>>>>> | | | | |
>>>>>>>> PHY PHY PHY ------someMII-----
>>>>>>>>
>>>>>>>> You see that eth0 is the CPU part of the "connection" and sw1p3 is the
>>>>>>>> switch part (port representation).
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Florian - I am sure you explained this before; I just dont remember. Why
>>>>>>> is there need to expose eth0? It seems to me sw1p0-3 are abstracted
>>>>>>> already in the kernel and the "cpu port" is merely a control interface.
>>>>>>
>>>>>>eth0 corresponds to a CPU Ethernet MAC facing e.g: sw1p3 switch port.
>>>>>>It is "regular" Ethernet driver connected to the switch without
>>>>>>switch-specific logic. The goal is twofold:
>>>>>>
>>>>>>- allow any regular Ethernet driver to be connected to an external
>>>>>>switch via e.g: MDIO/MDC or other without specific switch knowledge
>>>>>>- represents accurately how the hardware is designed/connected
>>>>>>
>>>>>>but maybe, we can simplify and have e.g: sw1p3 and eth0 be the same interface...
>>>>>
>>>>> I believe that hawing both sw1p3 and eth0 is the correct way of
>>>>> modelling this. sw1p3 is instance if switch chip driver representing the
>>>>> actual port of a switch. eth0 is an instance of some other ordinary NIC
>>>>> driver (8139too is my favorite :))
>>>>>
>>>>> This model allows to draw the exact picture.
>>>>> Also, when you add the described possibility to use iplink to build
>>>>> vlans, bridges whatever on the switch ports, it makes perfect sense to
>>>>> have this model.
>>>>>
>>>>> Merging sw1p3 and eth0 would cause a loose of information and confusion.
>>>>>
>>>>
>>>> CPU switch port and switch fabric itself should be configured in
>>>> consistence with host, in first place I mean a set of VLANs. Also it
>>>> should be mentioned that some generic knobs such as port rate and
>>>> duplex mode are meaningless for CPU switch port and a lot of status
>>>> information (rx/tx counters etc.) duplicates statistics of host
>>>> interface which is connected to switch port.
>>>
>>> It duplicates the information when things just work fine, consider an
>>> external switch connected via RGMII to a CPU Ethernet MAC, you might
>>> want to get statistics from both sides (the switch CPU port and the
>>> CPU Ethernet MAC) to diagnose why things are not working as expected,
>>> which unfortunately happens once in a while with RGMII.
>>>
>>> If we expose both net_device, we will be able to retrieve statistics
>>> about from both sides, without resorting to ad-hoc debugging tools,
>>> but maybe this is not worth the effort.
>>>
>> I also thought about this situation. Can we use the debugfs interface
>> for these purposes?
>
>Most of the time you are interesting in MIB counters for debugging
>such issues, so ethtool quickly comes handy for this task. Since we
>will provide per-port counters, the CPU port is not different, so
>there are no reason for restricting this.
I agree, no need to provide parallel api.
>
>>
>>>> So there are no reasons
>>>> to force user to configure this port manually, and automatic
>>>> configuration of CPU switch port without exporting them as netdev
>>>> seems as good approach.
>>>
>>> Well, maybe that's the answer, since we know that e.g: sw1p3 is always
>>> connected to e.g: eth0, we could create an automatic bridge between
>>> those two, this would keep the netdev exposure to user-space, but an
>>> user would not have to know about that specific detail to get things
>>> to work.
>>>
>> I would like go further and suggest to consider a netdev that is
>> connected to the CPU switch port, as master. In case when we need to
>> perform some action on whole switch (e.g. dump FIB).
>
>This is what the 'sw1' net_device in Jiri's proposal would do.
Except, sw1 is not cpu port. It's just a place holder not representing
any physical port/netdev.
>
>> And even name
>> switch ports, using master netdev name as prefix (e.g. eth1p0, eth1p1,
>> ..., eth1pN for ports of switch that is connected via eth1).
>
>I think the port naming using the switch abstract interface (sw1 here)
>is better because ports do belong to the switch.
>--
>Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists