[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140904123323.GF1867@nanopsycho.lan>
Date: Thu, 4 Sep 2014 14:33:23 +0200
From: Jiri Pirko <jiri@...nulli.us>
To: Pravin Shelar <pshelar@...ira.com>
Cc: netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>, nhorman@...driver.com,
andy@...yhouse.net, Thomas Graf <tgraf@...g.ch>,
Daniel Borkmann <dborkman@...hat.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Jesse Gross <jesse@...ira.com>, Andy Zhou <azhou@...ira.com>,
Ben Hutchings <ben@...adent.org.uk>,
Stephen Hemminger <stephen@...workplumber.org>,
jeffrey.t.kirsher@...el.com, vyasevic@...hat.com,
Cong Wang <xiyou.wangcong@...il.com>,
john.r.fastabend@...el.com, Eric Dumazet <edumazet@...gle.com>,
jhs@...atatu.com, sfeldma@...ulusnetworks.com,
f.fainelli@...il.com, roopa@...ulusnetworks.com,
John Linville <linville@...driver.com>,
"dev@...nvswitch.org" <dev@...nvswitch.org>, jasowang@...hat.com,
ebiederm@...ssion.com, Nicolas Dichtel <nicolas.dichtel@...nd.com>,
ryazanov.s.a@...il.com, buytenh@...tstofly.org,
aviadr@...lanox.com, nbd@...nwrt.org, alexei.starovoitov@...il.com,
Neil.Jerram@...aswitch.com, Rony Efraim <ronye@...lanox.com>
Subject: Re: [patch net-next 01/13] openvswitch: split flow structures into
ovs specific and generic ones
Wed, Sep 03, 2014 at 08:41:39PM CEST, pshelar@...ira.com wrote:
>On Wed, Sep 3, 2014 at 2:24 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>> After this, flow related structures can be used in other code.
>>
>> Signed-off-by: Jiri Pirko <jiri@...nulli.us>
>> ---
>> include/net/sw_flow.h | 99 ++++++++++++++++++++++++++++++++++
>> net/openvswitch/actions.c | 3 +-
>> net/openvswitch/datapath.c | 74 +++++++++++++-------------
>> net/openvswitch/datapath.h | 4 +-
>> net/openvswitch/flow.c | 6 +--
>> net/openvswitch/flow.h | 102 +++++++----------------------------
>> net/openvswitch/flow_netlink.c | 53 +++++++++---------
>> net/openvswitch/flow_netlink.h | 10 ++--
>> net/openvswitch/flow_table.c | 118 ++++++++++++++++++++++-------------------
>> net/openvswitch/flow_table.h | 30 +++++------
>> net/openvswitch/vport-gre.c | 4 +-
>> net/openvswitch/vport-vxlan.c | 2 +-
>> net/openvswitch/vport.c | 2 +-
>> net/openvswitch/vport.h | 2 +-
>> 14 files changed, 276 insertions(+), 233 deletions(-)
>> create mode 100644 include/net/sw_flow.h
>>
>> diff --git a/include/net/sw_flow.h b/include/net/sw_flow.h
>> new file mode 100644
>> index 0000000..21724f1
>> --- /dev/null
>> +++ b/include/net/sw_flow.h
>> @@ -0,0 +1,99 @@
>> +/*
>> + * include/net/sw_flow.h - Generic switch flow structures
>> + * Copyright (c) 2007-2012 Nicira, Inc.
>> + * Copyright (c) 2014 Jiri Pirko <jiri@...nulli.us>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + */
>> +
>> +#ifndef _NET_SW_FLOW_H_
>> +#define _NET_SW_FLOW_H_
>> +
>> +struct sw_flow_key_ipv4_tunnel {
>> + __be64 tun_id;
>> + __be32 ipv4_src;
>> + __be32 ipv4_dst;
>> + __be16 tun_flags;
>> + u8 ipv4_tos;
>> + u8 ipv4_ttl;
>> +};
>> +
>> +struct sw_flow_key {
>> + struct sw_flow_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
>> + struct {
>> + u32 priority; /* Packet QoS priority. */
>> + u32 skb_mark; /* SKB mark. */
>> + u16 in_port; /* Input switch port (or DP_MAX_PORTS). */
>> + } __packed phy; /* Safe when right after 'tun_key'. */
>> + struct {
>> + u8 src[ETH_ALEN]; /* Ethernet source address. */
>> + u8 dst[ETH_ALEN]; /* Ethernet destination address. */
>> + __be16 tci; /* 0 if no VLAN, VLAN_TAG_PRESENT set otherwise. */
>> + __be16 type; /* Ethernet frame type. */
>> + } eth;
>> + struct {
>> + u8 proto; /* IP protocol or lower 8 bits of ARP opcode. */
>> + u8 tos; /* IP ToS. */
>> + u8 ttl; /* IP TTL/hop limit. */
>> + u8 frag; /* One of OVS_FRAG_TYPE_*. */
>> + } ip;
>> + struct {
>> + __be16 src; /* TCP/UDP/SCTP source port. */
>> + __be16 dst; /* TCP/UDP/SCTP destination port. */
>> + __be16 flags; /* TCP flags. */
>> + } tp;
>> + union {
>> + struct {
>> + struct {
>> + __be32 src; /* IP source address. */
>> + __be32 dst; /* IP destination address. */
>> + } addr;
>> + struct {
>> + u8 sha[ETH_ALEN]; /* ARP source hardware address. */
>> + u8 tha[ETH_ALEN]; /* ARP target hardware address. */
>> + } arp;
>> + } ipv4;
>> + struct {
>> + struct {
>> + struct in6_addr src; /* IPv6 source address. */
>> + struct in6_addr dst; /* IPv6 destination address. */
>> + } addr;
>> + __be32 label; /* IPv6 flow label. */
>> + struct {
>> + struct in6_addr target; /* ND target address. */
>> + u8 sll[ETH_ALEN]; /* ND source link layer address. */
>> + u8 tll[ETH_ALEN]; /* ND target link layer address. */
>> + } nd;
>> + } ipv6;
>> + };
>> +} __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
>> +
>
>HW offload API should be separate from OVS module. This has following
>advantages.
>1. It can be managed by OVS userspace vswitchd process which has much
>better context to setup hardware flow table. Once we add capabilities
>for swdev, it is much more easier for vswitchd process to choose
>correct (hw or sw) flow table for given flow.
The idea is to add a nl attr in ovs genl iface so the vswitchd can
speficify the flow the to be in sw only, in hw only, in both.
I believe that is is more convenient to let switchd to communicate flows
via single iface.
>2. Other application that wants to use HW offload does not have
>dependency on OVS kernel module.
That is not the case for this patchset. Userspace can insert/remove
flows using the switchdev generic netlink api - see:
[patch net-next 13/13] switchdev: introduce Netlink API
>3. Hardware and software datapath remains separate, these two
>components has no dependency on each other, both can be developed
>independent of each other.
The general idea is to have the offloads handled in-kernel. Therefore I
hooked on to ovs kernel dp code.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists