lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1428435900.6449.25.camel@redhat.com>
Date:	Tue, 07 Apr 2015 14:45:00 -0500
From:	Dan Williams <dcbw@...hat.com>
To:	Mahesh Bandewar <maheshb@...gle.com>
Cc:	linux-netdev <netdev@...r.kernel.org>, jbenc@...hat.com
Subject: Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP

On Tue, 2015-04-07 at 11:32 -0700, Mahesh Bandewar wrote:
> On Mon, Apr 6, 2015 at 10:17 AM, Dan Williams <dcbw@...hat.com> wrote:
> > On Thu, 2015-04-02 at 18:39 -0700, Mahesh Bandewar wrote:
> >> On Thu, Apr 2, 2015 at 7:40 AM, Dan Williams <dcbw@...hat.com> wrote:
> >> > On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
> >> >> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@...hat.com> wrote:
> >> >> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
> >> >> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@...hat.com> wrote:
> >> >> >> > The broadcast MAC is supposed to be allowed whenever the device
> >> >> >> > has an IPv4 address, otherwise ARP requests get dropped on the
> >> >> >> > floor.  If ndo_set_rx_mode (and thus
> >> >> >> > ipvlan_set_multicast_mac_filter()) got called after the address
> >> >> >> > was added, it would blow away the broadcast MAC address in
> >> >> >> > mac_filters that was added at IPv4 address addition.
> >> >> >> >
> >> >> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
> >> >> >> >
> >> >> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
> >> >> >> which does not use broadcast.
> >> >> >>
> >> >> >> > 1) DHCP offers are typically unicast back to the device's MAC
> >> >> >> > address, and at the IP layer have a destination IP address
> >> >> >> > matching the new lease address.  In ipvlan unicast packets
> >> >> >> > are checked against existing addresses assigned to the ipvlan
> >> >> >> > interface, so clearly this fails hard because the ipvlan
> >> >> >> > interface has no IP addresses yet.  Workaround: request
> >> >> >> > that the server broadcast replies (-B for dhclient) which
> >> >> >> > don't get checked against the address list.
> >> >> >> >
> >> >> >> > 2) Even when that's done, mac_filters only allows the
> >> >> >> > broadcast MAC address when the interface has >= 1 IPv4
> >> >> >> > addresses, so double-fail, and the incoming DHCP offer
> >> >> >> > gets dropped on the floor again.
> >> >> >> >
> >> >> >> > To fix these issues, Watch for outgoing DHCP requests and
> >> >> >> > enable the broadcast MAC address indefinitely when one is seen.
> >> >> >> >
> >> >> >> This approach will work but adds a performance drag for all UDP
> >> >> >> traffic whether the link needs DHCPv4 or not (especially when it does
> >> >> >> not need). Logic works well only when the link (always) needs DHCPv4.
> >> >> >> When the link does not care about DHCPv4 the packet-inspection always
> >> >> >> happens.
> >> >> >
> >> >> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
> >> >> > broadcast filter enabled?
> >> >> Yes and it should stay enabled as it's going to have IPv4. Also once
> >> >> broadcast bit is set there should not be any additional jugglery /
> >> >> packet inspection which hurts performance.
> >> >
> >> > Yes, which is why my patch includes the check for dhcp4_seen and skips
> >> > inspecting the packet.  I could modify it to skip the L3 header check
> >> > too in that case.
> >> >
> >> dhcp4_seen works correctly when there is dhcpv4 happening, but
> >> snooping continues if autoconf is not used.
> >>
> >> >> > If you start DHCPv4 on the link you obviously
> >> >> > expect it to work, unless there's no DHCPv4 server available.  The patch
> >> >> > should also reset the broadcast filter on close, so network management
> >> >> > could simply start DHCPv4, watch it fail, close/open and set static IP.
> >> >> >
> >> >> > Alternatively we could have the broadcast filter on a 2 minute timer
> >> >> > that starts when DHCPv4 is seen.
> >> >> >
> >> >> Well, but this would be error-prone. I don't remember but DHCP timeout
> >> >> is not just 2 minutes.
> >> >
> >> > 2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
> >> > you don't get a reply to your requests within 2m, you aren't likely to
> >> > get one ever on this network and you can stop trying for a while.  You
> >> > either get a reply in which case you get an IPv4 address and broadcast
> >> > is always enabled, or you don't, in which case you don't really need
> >> > broadcast.  Then when userspace next tries DHCP, the timer would allow
> >> > broadcast for another 2 minutes.
> >> >
> >> >> >> Do we really need to special case DHCPv4? Do you see problems in
> >> >> >> inverting the current logic of adding broadcast bit (i.e. from
> >> >> >> enable-if-IPv4 to disable-if-IPv6-only)?
> >> >> >
> >> >> > The problem with this is dual-stack configuration.  Typically both
> >> >> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
> >> >> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
> >> >> > still ongoing; as I understand your proposal the filter would now block
> >> >> > broadcast packets that DHCPv4 is listening for.
> >> >> >
> >> >> That was a proposal and we can tweak it if necessary. My primary
> >> >> concern is that the solution should not make us pay per packet penalty
> >> >> once (and even if not) the DHCPv4 handshake is complete, which is not
> >> >> the case as it stands now.
> >> >
> >> > I'm not sure what you mean by "not make us pay a per packet penalty once
> >> > DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
> >> > filter must allow broadcast packets.  Before an IPv4 address is
> >> > assigned, the driver must watch for DHCPv4 and allow broadcast packets
> >> > so that the DHCP reply isn't missed.
> >> >
> >> If the link is not using DHCPv4 to configure, the snooping continues
> >> and that is per packet cost the needs to be paid. Probably this can be
> >> avoided by setting dhcp4_seen = true in add_addr4 handler.
> >>
> >> > If DHCPv4 gets no response, then there are two options:
> >> >
> >> > a) down the interface (which resets the broadcast filter to block
> >> > broadcast packets) and re-up it, and don't do DHCPv4 again
> >> >
> >> > b) have a timer in the driver that waits a period of time for an IPv4
> >> > address after a DHCPv4 request, and blocks broadcast when the timer
> >> > expires
> >> >
> >> Probably something similar where turn on the broadcast bit and wait
> >> for the interface to get configured (2 min timer) and at the expiry
> >> decide what is to be done about braodcast bit. If the addresses
> >> configured are all IPv6, then eliminate it and if any of them are IPv4
> >> then don't change it. In this case no special casing nor snooping is
> >> required and this should work for dual-stack scenario as well.
> >
> > This does not work for the case where, after configuring, the DHCPv4
> > address lease expires and the IPv4 address is removed, and then DHCPv4
> > is started again.  Possibly because the DHCP server was gone for a short
> > time.  The only way to handle that is to snoop again.
> >
> If the DHCPv4 lease is expired, then why DHCPv4-renew wont work (also
> it's unicast)? Also if the address applied is  IPv4 then the broadcast
> bit will stay. In this case I don't understand why this wont work.

You're right.  I mis-read your proposal above.

But having re-read it, are you proposing a 2m timer on interface up?  If
so (and ignore this if not) I don't think that works well either,
because there's no guarantee that interface configuration will happen
only close to interface up.  Maybe userspace adds IPv6 addresses
initially, and then later tries to do DHCP for some reason.  We simply
cannot rely on specific ordering of operations that userspace might want
to do.

If you don't mean a 2m timer from interface up, then ignore that, and
then what kind of time do you mean? :)

> If the link is stripped off of address(es), then it should again go
> back into the config-mode where it would turn on the broadcast bit and
> enable timer to get it configured. May be I'm missing something.

As above, I was wrong about the DHCPv4 lease expiration thing, but this
may still run afoul of userspace operation ordering if you are talking
about a 2m timer from interface up.

Just had another thought though; what if instead of snooping for all the
DHCP stuff, the code just snooped outgoing IPv4 packets for a broadcast
destination address?  Then turn on the broadcast bit filter for 2m.
That would look something like this:

static bool is_bcast4(struct sk_buff *skb)
{
	struct iphdr *ip4h;

	switch (skb->protocol) {
	case htons(ETH_P_ALL):
		/* Raw sockets */
		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
			break;
		/* Fall through */
	case htons(ETH_P_IP):
		if (unlikely(!pskb_may_pull(skb, sizeof(*ip4h))))
			return NULL;
		ip4h = ip_hdr(skb);
		if (ip4h->ihl < 5 || ip4h->version != 4)
			return NULL;
		return ip4h->daddr == INADDR_BROADCAST;
	}
	return false;
}

static int ipvlan_xmit_mode_l2(...)
{
	if (!ipvlan->ipv4cnt && !ipvlan->bcast4_seen && is_bcast4(skb)) {
		ipvlan->bcast4_seen = true;
		ipvlan_set_broadcast_mac_filter(ipvlan, true);
	}

Yes, it's still snooping for all packets, but it's a lot fewer compares
than looking for DHCP specifically.

> > Reaching for maximum performance is great, but if that is done by
> > ignoring/breaking a whole class of normal use-cases, I don't think
> > that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
> > I'll remove anything TCP-related in the kernel".
> >
> What is the amount of traffic that a link will be sending for config
> (DHCPv4 in this case)? I would guarantee you that it's so negligible
> to the extent that it's would be non-existent. But all the suggestions
> include snooping of every packet that goes through the link. However
> I'm not suggesting that this is non-important or can stay broken. But
> would rather have a solution that would not affect 99.9% of the
> traffic (per packet penalty that I mentioned earlier).

I agree, and it's great that we're brainstorming solutions.
Unfortunately as a kernel driver, it cannot rely on specific ordering or
timing of address addition/deletion since that's not something you can
predict at all.

> > Also, I don't think the snooping is as bad for performance as you may
> > think.  The only relevant issue here is L2 + IPv6-only,
> Also IPv4 case when the link is not using autoconf.

Yes, that's true.  But a kernel driver simply doesn't know what its
setup will be so it cannot assume much of anything about the addressing
setup despite whatever performance gains there might be...

Dan

> > and in that case
> > it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
> > the external case.  How much is that really going to slow things down,
> > versus breaking a huge part of IPv4 address configuration?
> >
> > Dan
> >
> >> > Dan
> >> >
> >> >> > Dan
> >> >> >
> >> >> >> > Signed-off-by: Dan Williams <dcbw@...hat.com>
> >> >> >> > ---
> >> >> >> > NOTE: this patch supercedes my previous two ipvlan patches:
> >> >> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> >> >> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> >> >> >> >
> >> >> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
> >> >> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
> >> >> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
> >> >> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
> >> >> >> >
> >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> >> >> >> > index 924ea98..7e0b8ff 100644
> >> >> >> > --- a/drivers/net/ipvlan/ipvlan.h
> >> >> >> > +++ b/drivers/net/ipvlan/ipvlan.h
> >> >> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
> >> >> >> >         struct list_head        addrs;
> >> >> >> >         int                     ipv4cnt;
> >> >> >> >         int                     ipv6cnt;
> >> >> >> > +       bool                    dhcp4_seen;
> >> >> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
> >> >> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
> >> >> >> >         netdev_features_t       sfeatures;
> >> >> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
> >> >> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
> >> >> >> >                                         const void *iaddr, bool is_v6);
> >> >> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> >> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
> >> >> >> >  #endif /* __IPVLAN_H */
> >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> >> >> >> > index 2a17500..6dd992b 100644
> >> >> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
> >> >> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> >> >> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> >> >> >> >                 lyr3h = arph;
> >> >> >> >                 break;
> >> >> >> >         }
> >> >> >> > +       case htons(ETH_P_ALL): {
> >> >> >> > +               /* Raw sockets */
> >> >> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> >> >> >> > +                       break;
> >> >> >> > +               /* Fall through */
> >> >> >> > +       }
> >> >> >> >         case htons(ETH_P_IP): {
> >> >> >> >                 u32 pktlen;
> >> >> >> >                 struct iphdr *ip4h;
> >> >> >> > @@ -454,16 +460,42 @@ out:
> >> >> >> >         return ipvlan_process_outbound(skb, ipvlan);
> >> >> >> >  }
> >> >> >> >
> >> >> >> > +#define DHCP_PACKET_MIN_LEN 282
> >> >> >> > +
> >> >> >> > +static bool is_dhcp(struct sk_buff *skb)
> >> >> >> > +{
> >> >> >> > +       struct iphdr *ip4h = ip_hdr(skb);
> >> >> >> > +       struct udphdr *udph;
> >> >> >> > +
> >> >> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> >> >> >> > +               || ip_is_fragment(ip4h))
> >> >> >> > +               return false;
> >> >> >> > +
> >> >> >> > +       /* In the case of RAW sockets the transport header is not set by
> >> >> >> > +        * the IP stack so we must set it ourselves.
> >> >> >> > +        */
> >> >> >> > +       if (skb->transport_header == skb->network_header)
> >> >> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> >> >> >> > +       udph = udp_hdr(skb);
> >> >> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> >> >> >> > +}
> >> >> >> > +
> >> >> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> >> >> >> >  {
> >> >> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >> >> >         struct ethhdr *eth = eth_hdr(skb);
> >> >> >> >         struct ipvl_addr *addr;
> >> >> >> >         void *lyr3h;
> >> >> >> >         int addr_type;
> >> >> >> >
> >> >> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> >> >> >> > +           && is_dhcp(skb)) {
> >> >> >> > +               ipvlan->dhcp4_seen = true;
> >> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> >> >> >> > +       }
> >> >> >> > +
> >> >> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> >> >> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >> >> >                 if (lyr3h) {
> >> >> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
> >> >> >> >                         if (addr)
> >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> >> >> >> > index 4f4099d..a497fb9 100644
> >> >> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
> >> >> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> >> >> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
> >> >> >> >
> >> >> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
> >> >> >> >
> >> >> >> > +       ipvlan->dhcp4_seen = false;
> >> >> >> > +
> >> >> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
> >> >> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
> >> >> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> >> >> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
> >> >> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
> >> >> >> >  }
> >> >> >> >
> >> >> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >> >> >  {
> >> >> >> >         struct net_device *dev = ipvlan->dev;
> >> >> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> >> >> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >> >> >> >                 netdev_for_each_mc_addr(ha, dev)
> >> >> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
> >> >> >> >
> >> >> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> >> >> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> >> >> >> > +
> >> >> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >> >> >> >                             IPVLAN_MAC_FILTER_SIZE);
> >> >> >> >         }
> >> >> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> >> >> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
> >> >> >> >         ipvlan->ipv4cnt = 0;
> >> >> >> >         ipvlan->ipv6cnt = 0;
> >> >> >> > +       ipvlan->dhcp4_seen = false;
> >> >> >> >
> >> >> >> >         /* TODO Probably put random address here to be presented to the
> >> >> >> >          * world but keep using the physical-dev address for the outgoing
> >> >> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
> >> >> >> >         list_del_rcu(&addr->anode);
> >> >> >> >         ipvlan->ipv4cnt--;
> >> >> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
> >> >> >> > -       if (!ipvlan->ipv4cnt)
> >> >> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> >> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >> >> >         kfree_rcu(addr, rcu);
> >> >> >> >
> >> >> >> >         return;
> >> >> >> > --
> >> >> >> > 2.1.0
> >> >> >> >
> >> >> >> >
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> >> >> the body of a message to majordomo@...r.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> >> the body of a message to majordomo@...r.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@...r.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ