[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1438235159.20182.125.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Thu, 30 Jul 2015 07:45:59 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Gregory Hoggarth <Gregory.Hoggarth@...iedtelesis.co.nz>
Cc: Shawn Bohrer <sbohrer@...advisors.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"alexgartrell@...il.com" <alexgartrell@...il.com>
Subject: Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4
On Thu, 2015-07-30 at 07:42 +0200, Eric Dumazet wrote:
> On Thu, 2015-07-30 at 01:41 +0000, Gregory Hoggarth wrote:
> > Hi,
> >
> > My company has also started having what appears to be the same problem, since we upgraded our embedded system to
> > linux kernel 3.16.
> >
> > I tried applying the suggested fix of READ_ONCE (and also had to add in the necessary code to compiler.h as 3.16
> > didn't have it) and unfortunately it did not fix the issue at all.
> >
> > Unfortunately we do not have an easy reproduction method, and do not know precisely what is going on in the system
> > when the issue occurs. We know it is a multicast UDP packet but that is about it. For us, the crash happens during
> > a critical stage in our system initialisation, making additional debugging and instrumentation difficult. Our
> > reproduction rate is approximately 1 out of 100 test runs; testing overnight we will usually see 3-5 instances of
> > the crash happening. All our attempts to increase the reproduction rate, or reproduce the issue in a simpler/more
> > controlled way have failed.
> >
> > Because we have customised the linux kernel, in some places radically, we assumed this was just a problem only we
> > were seeing, so we were trying to fix it ourselves. Now that this appears to be a generic problem upstream, we've
> > simply disabled UDP early demux in our system (since it's a new optimisation that we have lived without up till
> > now) and will wait for this issue to be fixed upstream instead.
> >
> >
> > So I'm sharing the debug patch I've written to help gather data on what is going on in the system, and some
> > of the output we've gotten from the debug, in case this is useful for anyone else who is seeing this problem or
> > would like to try and fix it.
> >
> > Feel free to ask questions, I'm not sure how much help I can be but will do my best. We'll be happy to assist in
> > testing any proposed fixes. I also have some more examples of kernel oops and debug output if that could be useful,
> > although the debug is from earlier iterations of the patch so that historical output is not as detailed as the
> > output generated by the latest version of the patch attached here.
> >
> > Thanks,
> > Greg Hoggarth
>
> CC UDP early demux author : Shawn Bohrer
>
> I believe this is a race condition with a dst escaping RCU protected
> region.
>
> I will send a patch.
>
Please try following fixes :
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 83aa604f9273..02baaa6d97b3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1778,9 +1778,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
struct dst_entry *dst = skb_dst(skb);
int ret;
- if (unlikely(sk->sk_rx_dst != dst))
+ if (unlikely(sk->sk_rx_dst != dst)) {
+ skb_dst_force(skb);
udp_sk_rx_dst_set(sk, dst);
-
+ }
ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
/* a return value > 0 means to resubmit the input, but
@@ -1995,7 +1996,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
skb->sk = sk;
skb->destructor = sock_efree;
- dst = sk->sk_rx_dst;
+ dst = READ_ONCE(sk->sk_rx_dst);
if (dst)
dst = dst_check(dst, 0);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists