lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHo-OoxoboVcn=MJj_XhaqeYkS_woGnuQ8vS9+BoDvdoae0YDA@mail.gmail.com>
Date:   Tue, 29 Jan 2019 01:39:52 -0800
From:   Maciej Żenczykowski <zenczykowski@...il.com>
To:     Michal Soltys <soltys@....info>
Cc:     David Miller <davem@...emloft.net>,
        Linux NetDev <netdev@...r.kernel.org>,
        Jay Vosburgh <jay.vosburgh@...onical.com>,
        Vincent Bernat <vincent@...nat.ch>,
        Mahesh Bandewar <maheshb@...gle.com>,
        Chonggang Li <chonggangli@...gle.com>
Subject: Re: [PATCH net 1/1] bonding: fix PACKET_ORIGDEV regression on bonding masters

Sounds good to me.

On Mon, Jan 28, 2019 at 5:47 PM Michal Soltys <soltys@....info> wrote:
>
> On 19/01/18 07:58, Maciej Żenczykowski wrote:
> > I'm not sure there's a truly good answer.
> >
> > Also note, that there's subtle differences between af_packet sockets
> > tied to ALL protocols vs tied to specific protocols (vs none/0 of
> > course).
> > However ETH_P_ALL protocol raw sockets need to be avoided if at all
> > possible due to perf impact.
> > Certainly we don't want any of these created outside of debugging.
> >
> > I believe we only cared about the utterly unbound to any device case
> > working (ie. delivering all LLDP packets all nics are receiving).
> > So that a single simple daemon could collect and export all the link
> > information.
> > [btw. now that I know about the PACKET_ORIGDEV option, I do -
> > unsurprisingly - see our daemon sets it]
> > We really didn't want the complexity of having to bind to individual
> > interfaces and having to try to dynamically adjust the set of raw
> > sockets.
> > (not to mention that less raw sockets is always good, also it's never
> > actually entirely clear which interfaces would need to be monitored
> > due to the preponderance of tunnels/macvlan/ipvlan/veth/dummy and
> > other virtual interface types)
> >
> > I think from a logic perspective:
> >
> > - if you bind to slave (physical interface) you should see *all*
> > packets on that interface,
> > regardless of whether the slave is active or not, and whether the
> > packet is link-local or not.
> > ie. this should show you exactly what nic is receiving (& sending:
> > PACKET_OUTGOING)
> > I should see *exactly* one copy of any packet.
> > This mode has to be useful for debugging.
> > This includes seeing packets which aren't even destined for me if nic
> > receives them.
> > (ie. destined to unicast/multicast mac we'd filter out: PACKET_OTHERHOST)
> > Although by itself this socket's existence shouldn't affect nic mac
> > filters and promiscuous mode
> > (I believe there are all sorts of various additional socket options to
> > change those).
> >
> > - if you don't bind to an interface then I think I'd expect to see
> > packets delivered to stack + link local packets
> >    - ie. should not see non-link-local packets discarded due to being
> > on inactive slaves
> >      similarly to how I should not see packets filtered out by virtue
> > of mac not matching interface mac filters
> >    - IMHO you should see *all* link local packets arriving at system
> > (ie. the original change's purpose)
> >      [and I think, but am not certain, I shouldn't need to use any
> > socket options to register the link local macs,
> >       although glancing at code I do think the daemon uses
> > PACKET_ADD_MEMBERSHIP to register lldp mac,
> >       so perhaps filtering should apply as normal?]
> >    - with PACKET_ORIGDEV they should always show up as from the
> > physical interfaces (ie. slaves)
> >    - without PACKET_ORIGDEV:
> >       - non link local packets should show up as from bonding/team master
>
> >       - link local packets on active slaves could show up on master or
> > slave (probably master is required ifirc some earlier fix???)
>
> As far as I can see, they (LLCs coming on active slave) will show via
> master as from master (w/o the socket option) or as from slave (with).
> Think Vincent's tests confirmed it.
>
> >       - link local packets on inactive slaves should show up on slave -
> > and definitely not on master
> >    - I don't think I should ever see any PACKET_OTHERHOST packets
> >
> > - if you bind to master you should see packets from active slaves (ie.
> > those that will get delivered to stack)
> >    - clearly you should not see non-link-local inactive slave packets
> > (they'll be dropped)
>
> >    - behaviour wrt. link local packets is more dicey... (I believe
> > somewhere in these threads patches there was some description of what
> > standard requires, but I don't off the top of my head remember)
>
> They must be seen, otherwise bonds attached to a linux bridge (I assume
> enslaving an interface to a bridge essentially counts as bind) will be
> blind to them (among those - e.g. to spanning tree information - this
> was what originally caused issues for me last year). Even the bridge
> code goes to extra length to not carelessly consume stp packets, if it's
> not actively participating in stp (of any kind). Aside that, certain
> [recent] bridge features like group_fwd_mask would be non-functional on
> bond ports as well.
>
> As for standard (the one I quoted in the oldest thread) expected the
> link-local packet to be both readable via master and slaves depending on
> need (though it didn't go into exact gory details). Your current patch I
> think cover all possible cases quite greacefully.
>
> >    - for an active-backup bond it would seem logical to see only active
> > links link local
> >    - for a multiple-active aggregate bond I'd be fine with seeing none,
> > or all active slaves link local packets - I guess even though the
> > later is confusing it makes sense
> >    - it might be okay to see link-local inactive slave packets, but I
> > think I'd prefer not to (could be configurable though) - this would
> > seem confusing/wrong to me.
> >    - and again I don't think I should see any PACKET_OTHERHOST packets here...
> >    - PACKET_ORIGDEV as above...
> >
> > I gave the above a fair amount of thought... but I'm not guaranteeing
> > I didn't make typos, or write something utterly stupid or
> > unimplementable or not how stuff should work for other reasons...
> > Comments welcome.
> >
> > I think this continues to be in line with my proposal from earlier in
> > the thread?
>
> (sorry for late reply)
>
> Yes, pretty much spot on. With some confirmation comments above.
>
> I did bridging tests yesterday (2 LACP bonds attached via separate
> switches treating linux bridge as a shared segment in mstp scenario) and
> all is working fine on my side as well.
>
> If everyone agrees with the proposed code, I will submit v2 patch with
> added comment explaining basic logic (or you could submit if you prefer).
>
> >
> > On Thu, Jan 17, 2019 at 4:27 PM Michal Soltys <soltys@....info> wrote:
> >>
> >> On 19/01/14 03:01, Maciej Żenczykowski wrote:
> >> > So I don't remember the specifics...
> >> >
> >> > (note I'm writing this all from memory without looking it up/testing
> >> > it - I may be utterly wrong or dreaming)
> >> >
> >> > But I seem to recall that the core problem we were trying to solve was
> >> > that a daemon listening
> >> > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device
> >> > would not receive LLDP packets
> >> > arriving on inactive bond slaves (either active-backup or lag).
> >> >
> >> > [inactive = link/carrier up, but not part of active aggregator]
> >> >
> >> > This made monitoring for miscabling harder (IFIRC the only non kernel
> >> > fix was to get the daemon to create
> >> > a separate AF_PACKET/88CC socket bound to every physical interface in
> >> > the system, or monitor for
> >> > inactive slaves and add extra packet sockets as needed).
> >> >
> >> > They would get re-parented to the master and then since the slave was
> >> > inactive they would be considered RX_HANDLER_EXACT match only and not
> >> > match the * interface.
> >> >
> >> > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it
> >> > helps in this case - AFAICR the packets never made it to the packet
> >> > socket.
> >> >
> >> > Perhaps going from:
> >> >    /* don't change skb->dev for link-local packets */
> >> >    if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS;
> >> >    if (bond_should_deliver_exact_match(skb, slave, bond)) return
> >> > RX_HANDLER_EXACT;
> >> >
> >> > to something more like:
> >> >    if (bond_should_deliver_exact_match(skb, slave, bond)) {
> >> >      /* don't change skb->dev for link-local packets on inactive slaves */
> >> >      if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS;
> >> >      return RX_HANDLER_EXACT;
> >> >    }
> >>
> >> Having checked the code (if I get the flow correctly), one
> >> thing/question - currently with Mahesh's fixes, not bound LLDP listener
> >> will receive all packets - both from active and inactive slaves directly
> >> (as the check for suppression is done after the link-local check).
> >>
> >> The version above will do the suppression check first - so all inactive
> >> slaves - excluding non-multi/non-broad ALB - will pass it and return
> >> RX_HANDLER_PASS if the packet is link-local. So those will be available
> >> w/o binding, but active slaves' packets will be available via master
> >> device (but with working PACKET_ORIGDEV now - so slave device can be
> >> retrieved easily). This is fine in your scenario I presume ?
> >>
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ