[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20f00da1-74a3-90a2-6759-186f6b415685@ziu.info>
Date: Tue, 29 Jan 2019 02:47:23 +0100
From: Michal Soltys <soltys@....info>
To: Maciej Żenczykowski <zenczykowski@...il.com>
Cc: David Miller <davem@...emloft.net>,
Linux NetDev <netdev@...r.kernel.org>,
Jay Vosburgh <jay.vosburgh@...onical.com>,
Vincent Bernat <vincent@...nat.ch>,
Mahesh Bandewar <maheshb@...gle.com>,
Chonggang Li <chonggangli@...gle.com>
Subject: Re: [PATCH net 1/1] bonding: fix PACKET_ORIGDEV regression on bonding
masters
On 19/01/18 07:58, Maciej Żenczykowski wrote:
> I'm not sure there's a truly good answer.
>
> Also note, that there's subtle differences between af_packet sockets
> tied to ALL protocols vs tied to specific protocols (vs none/0 of
> course).
> However ETH_P_ALL protocol raw sockets need to be avoided if at all
> possible due to perf impact.
> Certainly we don't want any of these created outside of debugging.
>
> I believe we only cared about the utterly unbound to any device case
> working (ie. delivering all LLDP packets all nics are receiving).
> So that a single simple daemon could collect and export all the link
> information.
> [btw. now that I know about the PACKET_ORIGDEV option, I do -
> unsurprisingly - see our daemon sets it]
> We really didn't want the complexity of having to bind to individual
> interfaces and having to try to dynamically adjust the set of raw
> sockets.
> (not to mention that less raw sockets is always good, also it's never
> actually entirely clear which interfaces would need to be monitored
> due to the preponderance of tunnels/macvlan/ipvlan/veth/dummy and
> other virtual interface types)
>
> I think from a logic perspective:
>
> - if you bind to slave (physical interface) you should see *all*
> packets on that interface,
> regardless of whether the slave is active or not, and whether the
> packet is link-local or not.
> ie. this should show you exactly what nic is receiving (& sending:
> PACKET_OUTGOING)
> I should see *exactly* one copy of any packet.
> This mode has to be useful for debugging.
> This includes seeing packets which aren't even destined for me if nic
> receives them.
> (ie. destined to unicast/multicast mac we'd filter out: PACKET_OTHERHOST)
> Although by itself this socket's existence shouldn't affect nic mac
> filters and promiscuous mode
> (I believe there are all sorts of various additional socket options to
> change those).
>
> - if you don't bind to an interface then I think I'd expect to see
> packets delivered to stack + link local packets
> - ie. should not see non-link-local packets discarded due to being
> on inactive slaves
> similarly to how I should not see packets filtered out by virtue
> of mac not matching interface mac filters
> - IMHO you should see *all* link local packets arriving at system
> (ie. the original change's purpose)
> [and I think, but am not certain, I shouldn't need to use any
> socket options to register the link local macs,
> although glancing at code I do think the daemon uses
> PACKET_ADD_MEMBERSHIP to register lldp mac,
> so perhaps filtering should apply as normal?]
> - with PACKET_ORIGDEV they should always show up as from the
> physical interfaces (ie. slaves)
> - without PACKET_ORIGDEV:
> - non link local packets should show up as from bonding/team master
> - link local packets on active slaves could show up on master or
> slave (probably master is required ifirc some earlier fix???)
As far as I can see, they (LLCs coming on active slave) will show via
master as from master (w/o the socket option) or as from slave (with).
Think Vincent's tests confirmed it.
> - link local packets on inactive slaves should show up on slave -
> and definitely not on master
> - I don't think I should ever see any PACKET_OTHERHOST packets
>
> - if you bind to master you should see packets from active slaves (ie.
> those that will get delivered to stack)
> - clearly you should not see non-link-local inactive slave packets
> (they'll be dropped)
> - behaviour wrt. link local packets is more dicey... (I believe
> somewhere in these threads patches there was some description of what
> standard requires, but I don't off the top of my head remember)
They must be seen, otherwise bonds attached to a linux bridge (I assume
enslaving an interface to a bridge essentially counts as bind) will be
blind to them (among those - e.g. to spanning tree information - this
was what originally caused issues for me last year). Even the bridge
code goes to extra length to not carelessly consume stp packets, if it's
not actively participating in stp (of any kind). Aside that, certain
[recent] bridge features like group_fwd_mask would be non-functional on
bond ports as well.
As for standard (the one I quoted in the oldest thread) expected the
link-local packet to be both readable via master and slaves depending on
need (though it didn't go into exact gory details). Your current patch I
think cover all possible cases quite greacefully.
> - for an active-backup bond it would seem logical to see only active
> links link local
> - for a multiple-active aggregate bond I'd be fine with seeing none,
> or all active slaves link local packets - I guess even though the
> later is confusing it makes sense
> - it might be okay to see link-local inactive slave packets, but I
> think I'd prefer not to (could be configurable though) - this would
> seem confusing/wrong to me.
> - and again I don't think I should see any PACKET_OTHERHOST packets here...
> - PACKET_ORIGDEV as above...
>
> I gave the above a fair amount of thought... but I'm not guaranteeing
> I didn't make typos, or write something utterly stupid or
> unimplementable or not how stuff should work for other reasons...
> Comments welcome.
>
> I think this continues to be in line with my proposal from earlier in
> the thread?
(sorry for late reply)
Yes, pretty much spot on. With some confirmation comments above.
I did bridging tests yesterday (2 LACP bonds attached via separate
switches treating linux bridge as a shared segment in mstp scenario) and
all is working fine on my side as well.
If everyone agrees with the proposed code, I will submit v2 patch with
added comment explaining basic logic (or you could submit if you prefer).
>
> On Thu, Jan 17, 2019 at 4:27 PM Michal Soltys <soltys@....info> wrote:
>>
>> On 19/01/14 03:01, Maciej Żenczykowski wrote:
>> > So I don't remember the specifics...
>> >
>> > (note I'm writing this all from memory without looking it up/testing
>> > it - I may be utterly wrong or dreaming)
>> >
>> > But I seem to recall that the core problem we were trying to solve was
>> > that a daemon listening
>> > on an AF_PACKET ethertype 88CC [LLDP] socket not bound to any device
>> > would not receive LLDP packets
>> > arriving on inactive bond slaves (either active-backup or lag).
>> >
>> > [inactive = link/carrier up, but not part of active aggregator]
>> >
>> > This made monitoring for miscabling harder (IFIRC the only non kernel
>> > fix was to get the daemon to create
>> > a separate AF_PACKET/88CC socket bound to every physical interface in
>> > the system, or monitor for
>> > inactive slaves and add extra packet sockets as needed).
>> >
>> > They would get re-parented to the master and then since the slave was
>> > inactive they would be considered RX_HANDLER_EXACT match only and not
>> > match the * interface.
>> >
>> > Honestly I wasn't aware of PACKET_ORIGDEV, although I don't think it
>> > helps in this case - AFAICR the packets never made it to the packet
>> > socket.
>> >
>> > Perhaps going from:
>> > /* don't change skb->dev for link-local packets */
>> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS;
>> > if (bond_should_deliver_exact_match(skb, slave, bond)) return
>> > RX_HANDLER_EXACT;
>> >
>> > to something more like:
>> > if (bond_should_deliver_exact_match(skb, slave, bond)) {
>> > /* don't change skb->dev for link-local packets on inactive slaves */
>> > if (is_link_local_ether_addr(eth_hdr(skb)->h_dest)) return RX_HANDLER_PASS;
>> > return RX_HANDLER_EXACT;
>> > }
>>
>> Having checked the code (if I get the flow correctly), one
>> thing/question - currently with Mahesh's fixes, not bound LLDP listener
>> will receive all packets - both from active and inactive slaves directly
>> (as the check for suppression is done after the link-local check).
>>
>> The version above will do the suppression check first - so all inactive
>> slaves - excluding non-multi/non-broad ALB - will pass it and return
>> RX_HANDLER_PASS if the packet is link-local. So those will be available
>> w/o binding, but active slaves' packets will be available via master
>> device (but with working PACKET_ORIGDEV now - so slave device can be
>> retrieved easily). This is fine in your scenario I presume ?
>>
>
Powered by blists - more mailing lists