lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CD1A9D96-F1E1-4AF6-B089-41AEBF7F1699@cumulusnetworks.com>
Date:	Wed, 2 Apr 2014 09:15:55 -0700
From:	Scott Feldman <sfeldma@...ulusnetworks.com>
To:	"John W. Linville" <linville@...driver.com>
Cc:	Andy Gospodarek <andy@...yhouse.net>,
	Jiri Pirko <jiri@...nulli.us>,
	Roopa Prabhu <roopa@...ulusnetworks.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	Neil Horman <nhorman@...driver.com>,
	Thomas Graf <tgraf@...g.ch>, netdev <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	dborkman <dborkman@...hat.com>, ogerlitz <ogerlitz@...lanox.com>,
	jesse <jesse@...ira.com>, pshelar <pshelar@...ira.com>,
	azhou <azhou@...ira.com>, Ben Hutchings <ben@...adent.org.uk>,
	Stephen Hemminger <stephen@...workplumber.org>,
	jeffrey.t.kirsher@...el.com, vyasevic <vyasevic@...hat.com>,
	Cong Wang <xiyou.wangcong@...il.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Lennert Buytenhek <buytenh@...tstofly.org>,
	Shrijeet Mukherjee <shm@...ulusnetworks.com>
Subject: Re: [patch net-next RFC 0/4] introduce infrastructure for support of switch chip datapath


On Apr 2, 2014, at 8:25 AM, John W. Linville <linville@...driver.com> wrote:

> On Wed, Apr 02, 2014 at 10:32:49AM -0400, Andy Gospodarek wrote:
>> On 04/01/2014 03:13 PM, Scott Feldman wrote:
>>> On Mar 26, 2014, at 11:03 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>>> 
>>>> Wed, Mar 26, 2014 at 06:47:15PM CET, roopa@...ulusnetworks.com wrote:
>>>>> On 3/26/14, 9:59 AM, Jiri Pirko wrote:
>>>>>> Wed, Mar 26, 2014 at 05:54:17PM CET, roopa@...ulusnetworks.com wrote:
>>>>>> So you implement bonding netlink api? Or you hook into bonding driver
>>>>>> itselt? Can you show us the code?
>>>>> We use the netlink API and libnl. In our current model, our switch
>>>>> chip driver listens to netlink notifications and programs the switch
>>>>> chip. The switch chip driver uses libnl caches and libnl netlink apis
>>>>> to reflect the kernel state to switch chip.
>>>> 
>>>> So when you configure for example bonding over 2 ports, you actually use
>>>> bonding driver to do that. And you userspace app listens to
>>>> notifications and programs the switch chip accordingly. Am I close?
>>>> 
>>>> How about data? Is this new "bonding" interface able to assign ip to is
>>>> and send/receive packets.
>>>> 
>>>> I'm still not sure I understand your concept. Do you have some
>>>> documentation for it available?
>>> Actually Jiri this is the code you and I worked on recently to netlink-ify bonding/slave attributes and active/inactive notification.  You have it right, user uses normal ip link tools and bonding driver to create bond, set attributes, and enslave switch ports.  RTM_NEWLINK is used to program ASIC to offload LAG to HW.  RTM_NEWLINK msgs contains bond attributes (mode, etc) and slave list, as well as slave status.  This is enough information to program ASIC.  Once programmed, ASIC offloads the data plane traffic, and in the case of egress, handles the LAG hash distribution.  Only the LACP control plane traffic makes it to the bonding driver; data plane traffic does not make it to the bonding driver.
>>> 
>>> So, not trying to sound like a smart-ass, but the documentation is the bonding driver, specifically the netlink attributes/notifications.
>>> 
>>> -scott
>> 
>> Using netlink messages to notify drivers for these ASICs really
>> seems like a great way to handle things.  It would obviously require
>> some expansion of netlink, but that seems fine.
>> 
>> I would prefer that ASIC vendors write initial drivers for their
>> ASICs such that each physical port is detected and exported as a
>> netdev.  This would mean each *minimal* kernel driver for an ASIC
>> would need to have support for the following (off the top of my
>> head):
>> 
>> - detect link status on an interface
>> - set an interface's MAC address
>> - configure the chip to send all frames to the CPU
>> - register a napi handler for the interfaces (depending on
>> packet-buffering capabilities in the hardware)
>> 
>> As support for new hardware capabilities are moved from switch
>> vendor SDKs to their kernel driver the driver can begin to listen
>> for netlink messages that:
>> 
>> - setup bonds/teams
>> - add ports to bridge groups
>> - configure port-based or mac-based VLANs
>> - add unicast and multicast entries
>> - add and remove entries from a flow table
>> - ...
>> 
>> Maybe this all seems to matter-of-fact and the discussion has
>> evolved well beyond something this high-level, but there still seems
>> to be significant discussion about whether or not the ASIC should be
>> exported as a netdev and I'm just not seeing a compelling reason.
>> This was my attempt to explain why.  :)
> 
> Andy and I discussed this off-line, so I am admittedly partial to
> the conclusions we shared as reflected above... :-)
> 
> While I might be convinced that there should be _something_ to
> represent the switch chip for some purpose (e.g. topology mapping),
> I'm not at all convinced that thing should be a netdev.  I don't see
> where the switch chip by itself looks much like any other netdev at
> all, especially once you model the actual front-panel ports with
> their own netdevs.  I do know that having an extra "magic netdev"
> in the wireless space added a lot of confusion for no clear gain,
> leading to it later being abolished.
> 
> Modeling at the switch level might make more sense from a flow
> management perspective?  But if those flows are managed using a netlink
> protocol, does it matter what sort of entity is listening and acting
> on those messages?  If a switch-specific interface is needed for that,
> we should build it rather than pretending it looks like a netdev.
> I also think that throwing the DSA switches in with flow-based and
> "Enterprise" switches may just be confusing things.
> 
> I think that the opening bid should be a minimal hardware driver that
> models each front-panel port with a netdev and passes all traffic
> to/from the CPU.  Intelligence beyond that should be added on a
> 'can-do' basis, with individual drivers (or corresponding userland
> components) listening to existing netlink traffic and implementing
> support for existing protocols to the best of their abilities.
> Missing functionality in the netlink protocols or other functions
> (e.g. bonding, bridging, etc) can be evolved over time as we discover
> missing bits required for switch acceleration.

I agree completely with your/Andy’s view.  It’s the switch port, not the switch, that needs to be modeled as a netdev.  The switch port is the abstraction that allows other existing virtual devices (bridges, bond, vxlans, etc) to cuddle against.  Is a switch port a special netdev in some way?  At a high level, not really.  I mean in sense it’s just eth48 on a super NIC.  OK, there may be some advantage to setting a IFF_SWITCH_PORT on the switch port netdev, so cuddling netdevs could get a hint that their data plane might be offloaded.

I’ve been back-and-forth on the switch netdev.  Today I’m not for it.  But I’m still searching for a reason.  At one point I thought a switch netdev would be nice in a L3 router case where we needed a router IP address to do things like OSPF unnumbered interfaces, but even in that case, we can just put the router IP on lo.  Another reason would be to use the switch netdev as a place for switch-wide settings and status.  For example, 
ethtool -S stats on switch netdev would show switch-wide stats like ACL drops or something like that.  Maybe a switch device is modeled as a new device class?  I guess it comes down to how much is duplicated between different vendors' switch driver implementations.

Agree on the missing netlink functionality point, add it as we go.  Outside the bonding stuff we recently added, we (Cumulus) find netlink pretty complete as-is to program modern, enterprise-class switch chips.

-scott



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ