[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEP_g=8EpR_Z54ZvGZzFAeCVwVED3SUrddvLt7-aQ76Q919ekQ@mail.gmail.com>
Date: Sat, 19 Nov 2011 15:06:53 -0800
From: Jesse Gross <jesse@...ira.com>
To: John Fastabend <john.r.fastabend@...el.com>
Cc: "David S. Miller" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"dev@...nvswitch.org" <dev@...nvswitch.org>
Subject: Re: [PATCH net-next 4/4] net: Add Open vSwitch kernel components.
On Fri, Nov 18, 2011 at 9:30 PM, John Fastabend
<john.r.fastabend@...el.com> wrote:
> On 11/18/2011 3:12 PM, Jesse Gross wrote:
>> + */
>> +enum ovs_frag_type {
>> + OVS_FRAG_TYPE_NONE,
>> + OVS_FRAG_TYPE_FIRST,
>> + OVS_FRAG_TYPE_LATER,
>> + __OVS_FRAG_TYPE_MAX
>> +};
>> +
>> +#define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
>> +
>> +struct ovs_key_ethernet {
>> + __u8 eth_src[6];
>> + __u8 eth_dst[6];
>> +};
>> +
>> +struct ovs_key_ipv4 {
>> + __be32 ipv4_src;
>> + __be32 ipv4_dst;
>> + __u8 ipv4_proto;
>> + __u8 ipv4_tos;
>> + __u8 ipv4_ttl;
>> + __u8 ipv4_frag; /* One of OVS_FRAG_TYPE_*. */
>> +};
>> +
>> +struct ovs_key_ipv6 {
>> + __be32 ipv6_src[4];
>> + __be32 ipv6_dst[4];
>> + __be32 ipv6_label; /* 20-bits in least-significant bits. */
>> + __u8 ipv6_proto;
>> + __u8 ipv6_tclass;
>> + __u8 ipv6_hlimit;
>> + __u8 ipv6_frag; /* One of OVS_FRAG_TYPE_*. */
>> +};
>> +
>> +struct ovs_key_tcp {
>> + __be16 tcp_src;
>> + __be16 tcp_dst;
>> +};
>> +
>> +struct ovs_key_udp {
>> + __be16 udp_src;
>> + __be16 udp_dst;
>> +};
>> +
>> +struct ovs_key_icmp {
>> + __u8 icmp_type;
>> + __u8 icmp_code;
>> +};
>> +
>> +struct ovs_key_icmpv6 {
>> + __u8 icmpv6_type;
>> + __u8 icmpv6_code;
>> +};
>> +
>> +struct ovs_key_arp {
>> + __be32 arp_sip;
>> + __be32 arp_tip;
>> + __be16 arp_op;
>> + __u8 arp_sha[6];
>> + __u8 arp_tha[6];
>> +};
>> +
>> +struct ovs_key_nd {
>> + __u32 nd_target[4];
>> + __u8 nd_sll[6];
>> + __u8 nd_tll[6];
>> +};
>> +
>
> We already have defines for many of these headers
> struct arphdr {
> __be16 ar_hrd; /* format of hardware address */
> __be16 ar_pro; /* format of protocol address */
> unsigned char ar_hln; /* length of hardware address */
> unsigned char ar_pln; /* length of protocol address */
> __be16 ar_op; /* ARP opcode (command) */
>
> #if 0
> /*
> * Ethernet looks like this : This bit is variable sized however...
> */
> unsigned char ar_sha[ETH_ALEN]; /* sender hardware address */
> unsigned char ar_sip[4]; /* sender IP address */
> unsigned char ar_tha[ETH_ALEN]; /* target hardware address */
> unsigned char ar_tip[4]; /* target IP address */
> #endif
>
> };
>
> Do we have to redefine them here?
These aren't packet format definitions, they're keys for describing a
flow to userspace. For example, they don't contain checksums or
lengths because those vary on a per-packet basis.
>> --- /dev/null
>> +++ b/net/openvswitch/actions.c
>> @@ -0,0 +1,415 @@
>> +/*
>> + * Copyright (c) 2007-2011 Nicira Networks.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of version 2 of the GNU General Public
>> + * License as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> + * 02110-1301, USA
>> + */
>> +
>
> It seems like most of the actions could be implemented with packet
> actions ./net/sched or someplace else more generically.
It seems like a nice idea in concept but I think in practice it's not
worth the complexity. The actual packet modification code here is
really fairly simple and the integration that you describe would
probably be more complicated.
>> --- /dev/null
>> +++ b/net/openvswitch/datapath.c
[...]
>> +/* Must be called with rcu_read_lock, genl_mutex, or RTNL lock. */
>> +static struct datapath *get_dp(int dp_ifindex)
>> +{
>> + struct datapath *dp = NULL;
>> + struct net_device *dev;
>> +
>> + rcu_read_lock();
>
> comment seems incorrect in light of this rcu_read_lock()
The locking requirements that it is describing are to ensure that the
returned structure does not disappear while the caller is using it.
The use of rcu_read_lock inside the function is for its internal use
which the caller should not have to worry about and not sufficient to
protect the caller since the RCU lock is released before returning.
>> +/* Must be called with genl_mutex. */
>> +static struct flow_table *get_table_protected(struct datapath *dp)
>
> I think, 'ovstable_dereference()' would be a better name. It matches existing
> semantics of rtnl_dereference(). Bit of a nit though.
[...]
>> +/* Must be called with rcu_read_lock or RTNL lock. */
>> +static struct vport *get_vport_protected(struct datapath *dp, u16 port_no)
>> +{
>
> hmm but rcu_dereference_rtnl is not the same as rtnl_dereference_protected(). So
> which one did you actually mean? The function name makes me thing you really wanted
> the protected variant.
For both of these, the code and comments are correct but I can see how
the names would be confusing. I think the right approach is to add a
genl_dereference() function and then use that (or the corresponding
RTNL/RCU functions) as appropriate and then drop these two functions.
>> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
>> new file mode 100644
>> index 0000000..77c16c7
>> --- /dev/null
>> +++ b/net/openvswitch/flow.c
[...]
>> +static int check_header(struct sk_buff *skb, int len)
>> +{
>> + if (unlikely(skb->len < len))
>> + return -EINVAL;
>
> I believe pskb_may_pull() checks skb->len so this is just a
> return value change?
Yes but the difference is important: -ENOMEM results in the packet
being dropped because it's a local processing problem; -EINVAL means
that the packet itself was invalid and is marked as such but not
immediately dropped. The policy as to what to do is left to userspace
depending on what it is doing because, for example, L2 processing
shouldn't care about an invalid IP header but L3 processing would.
>> diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
>> new file mode 100644
>> index 0000000..6cc8719
>> --- /dev/null
>> +++ b/net/openvswitch/vport-netdev.h
[...]
>> +struct netdev_vport {
>> + struct net_device *dev;
>> +};
>
> This seems looks like a pretty worthless abstraction on the
> surface at least.
>
>> +
>> +static inline struct netdev_vport *
>> +netdev_vport_priv(const struct vport *vport)
>> +{
>> + return vport_priv(vport);
>> +}
>
> Again it seems straight forward enough to just call vport_priv()
It's obviously possible to drop these but I think they're a little
nicer than doing a bunch of casts all over and there's no cost to
them.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists