>From 9570dd1bd728399bf47722a9e966ace42d6441fa Mon Sep 17 00:00:00 2001 From: "Jorge Boncompte [DTI2]" Date: Mon, 26 Mar 2012 12:52:13 +0200 Subject: [PATCH] [PATCH] garp: fix livelock on link bouncing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes a bug in the GARP code when a VLAN interface changes from down to up state quickly. The tree insertion routine doesn't check that the attr it's still on the tree and enters and infinite loop (livelocks). Instead of returning an error we just reuse the existing attr to not disturb the switch in front of us. Conversation with Patrick follows... El 30/11/2011 14:02, Patrick McHardy escribió: > On 11/29/2011 06:01 PM, Jorge Boncompte [DTI2] wrote: >> El 29/11/2011 14:50, Patrick McHardy escribió: >>> On 11/28/2011 04:39 PM, Jorge Boncompte [DTI2] wrote: >>>> El 28/11/2011 16:11, Patrick McHardy escribió: >>>>> On 25.11.2011 18:13, Jorge Boncompte [DTI2] wrote: >>>>>> Patrick, could you take a look at the attached patch? It fixes >>>>>> a bug in the GARP code when you put a VLAN interface down/up in a >>>>>> quick sucession. I've noticed it because some of my boxes where >>>>>> hanging when moving interfaces between netns's. >>>>> I'm not sure I understand the condition leading to this. >>>>> Basically the GVRP join should only happen under the >>>>> RTNL mutex, so I don't see how we can race during that. >>>> It has nothing to do with the RTNL, the problem is that >>>> garp_request_leave() arms the timer and the leave packet it's not sent >>>> inmediately and the attribute it's still on the tree when the new >>>> garp_request_join() happens, for example moving a VLAN interface from >>>> netns or by a quick down/up sucession. >>>> garp_attr_insert() doesn't handle the condition that the attr exists >>>> already and livelocks in the while loop. >>>> >>>> So, instead of sending the leave packet synchronously from the >>>> leave path I have opted for reinitializing the already existing >>>> attribute that should not disturb the switch where we're attached. >>> Yes, that seems like the better option. But I don't think this is fully >>> correct (might be missing something though, it has been quite a while >>> since I last worked on this code). If the applicant is in >>> GARP_APPLICANT_LA state and a GARP_EVENT_TRANSMIT_PDU event is >>> generated, the action will be GARP_ACTION_S_LEAVE_EMPTY, after which >>> the attribute is destroyed immediately. Since we skip insertion with >>> your patch, the attribute will incorrectly be gone. So I think what we >>> need is either refcounting on the attribute to handle this case or >>> alternatively send the message immediately. >>> >>> Does that make sense? >> What you describe happens from the timer, after garp_request_leave() >> has moved the applicant to _LA state and _REQ_LEAVE event. That's what >> happens in the normal case when you put the interface down for example. >> >> In the case I describe the garp_request_join() comes betweeen the >> call to garp_request_leave() and before the timer fires. The attr is >> still on the tree so without my fix you livelock. With my changes >> garp_request_join() calls garp_attr_create() that founds and return the >> existing attr, and then calls garp_attr_event() with the _REQ_JOIN event. >> This moves the attr from the _LA state to the _VA one. After that the >> timer fires and founds the attr in the _VA state instead of _LA and so >> the attr is not destroyed (and no need to refcount it). >> >> All this happens under the app->lock so the timer sees the attr on >> the _LA state or on the VA state and in either case it does the right >> thing from my POV. I am no expert on GARP but I've researched a bit the >> protocol and the state transition from _LA to _VA looks ok to me. >> >> Here you have some output of a my patch with some test printk()'s >> in case you can understand it better that my non-native english :) > > OK, this seems fine to me. If you change the WARN_ON to a > WARN_ON(err) with err = garp_attr_insert(...) you can add > my Acked-by: Patrick McHardy . > > Thanks! Signed-off-by: Jorge Boncompte [DTI2] Acked-by: Patrick McHardy --- net/802/garp.c | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/net/802/garp.c b/net/802/garp.c index 8e21b6d..ad72ac4 100644 --- a/net/802/garp.c +++ b/net/802/garp.c @@ -167,7 +167,7 @@ static struct garp_attr *garp_attr_lookup(const struct garp_applicant *app, return NULL; } -static void garp_attr_insert(struct garp_applicant *app, struct garp_attr *new) +static int garp_attr_insert(struct garp_applicant *app, struct garp_attr *new) { struct rb_node *parent = NULL, **p = &app->gid.rb_node; struct garp_attr *attr; @@ -181,15 +181,24 @@ static void garp_attr_insert(struct garp_applicant *app, struct garp_attr *new) p = &parent->rb_left; else if (d > 0) p = &parent->rb_right; + else + return -EEXIST; } rb_link_node(&new->node, parent, p); rb_insert_color(&new->node, &app->gid); + + return 0; } static struct garp_attr *garp_attr_create(struct garp_applicant *app, const void *data, u8 len, u8 type) { struct garp_attr *attr; + int err; + + attr = garp_attr_lookup(app, data, len, type); + if (attr) + return attr; attr = kmalloc(sizeof(*attr) + len, GFP_ATOMIC); if (!attr) @@ -198,7 +207,10 @@ static struct garp_attr *garp_attr_create(struct garp_applicant *app, attr->type = type; attr->dlen = len; memcpy(attr->data, data, len); - garp_attr_insert(app, attr); + err = garp_attr_insert(app, attr); + + WARN_ON(err); + return attr; } -- 1.7.8.3