netdev - Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 30 Jul 2015 07:42:49 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Gregory Hoggarth <Gregory.Hoggarth@...iedtelesis.co.nz>,
	Shawn Bohrer <sbohrer@...advisors.com>
Cc:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"alexgartrell@...il.com" <alexgartrell@...il.com>
Subject: Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4

On Thu, 2015-07-30 at 01:41 +0000, Gregory Hoggarth wrote:
> Hi,
> 
> My company has also started having what appears to be the same problem, since we upgraded our embedded system to 
> linux kernel 3.16.
> 
> I tried applying the suggested fix of READ_ONCE (and also had to add in the necessary code to compiler.h as 3.16
> didn't have it) and unfortunately it did not fix the issue at all.
> 
> Unfortunately we do not have an easy reproduction method, and do not know precisely what is going on in the system
> when the issue occurs. We know it is a multicast UDP packet but that is about it. For us, the crash happens during
> a critical stage in our system initialisation, making additional debugging and instrumentation difficult. Our 
> reproduction rate is approximately 1 out of 100 test runs; testing overnight we will usually see 3-5 instances of 
> the crash happening. All our attempts to increase the reproduction rate, or reproduce the issue in a simpler/more 
> controlled way have failed.
> 
> Because we have customised the linux kernel, in some places radically, we assumed this was just a problem only we 
> were seeing, so we were trying to fix it ourselves. Now that this appears to be a generic problem upstream, we've 
> simply disabled UDP early demux in our system (since it's a new optimisation that we have lived without up till 
> now) and will wait for this issue to be fixed upstream instead.
> 
> 
> So I'm sharing the debug patch I've written to help gather data on what is going on in the system, and some
> of the output we've gotten from the debug, in case this is useful for anyone else who is seeing this problem or
> would like to try and fix it.
> 
> Feel free to ask questions, I'm not sure how much help I can be but will do my best. We'll be happy to assist in
> testing any proposed fixes. I also have some more examples of kernel oops and debug output if that could be useful, 
> although the debug is from earlier iterations of the patch so that historical output is not as detailed as the 
> output generated by the latest version of the patch attached here.
> 
> Thanks,
> Greg Hoggarth

CC UDP early demux author : Shawn Bohrer 

I believe this is a race condition with a dst escaping RCU protected
region.

I will send a patch.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html