linux-kernel - Re: 2.6.23-rc1: BUG_ON in kmap_atomic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070723152712.02ded067.akpm@linux-foundation.org>
Date:	Mon, 23 Jul 2007 15:27:12 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Alexey Dobriyan <adobriyan@...il.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()

On Tue, 24 Jul 2007 02:04:46 +0400
Alexey Dobriyan <adobriyan@...il.com> wrote:

> On Mon, Jul 23, 2007 at 02:11:37PM -0700, Andrew Morton wrote:
> > On Tue, 24 Jul 2007 01:01:53 +0400
> > Alexey Dobriyan <adobriyan@...il.com> wrote:
> > 
> > > On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > > > > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > > > > 
> > > > > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > > > > 
> > > > > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > > > > Both traces point up into the page allocator, but I _think_ that's stack
> > > > > gunk.
> > > > 
> > > > Ahh, you suspect networking.
> > > > 
> > > > Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> > > > receiving ~20 junk packets per second, one gathering netconsole output
> > > > and ssh to it, no conntracks and fancy stuff.
> > > > 
> > > > [reboots with cables physically unplugged]
> > > 
> > > OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
> > > small files) with both cables unplugged. It all went fine for ~5 minutes
> > > after that it crashed exactly same way after 10 secs after plugging one
> > > of them.
> > 
> > It'd be nice to get a clean trace.  Are you able to obtain the full
> > trace with CONFIG_FRAME_POINTER=y?
> 
> Sorry, no camera shot, finding camera requires wakening up M. :)
> 
> It took longer that usual, but here it is
> 
> 	kmap_atomic
> 	get_page_from_freelist
> 	__alloc_pages
> 	cache_alloc_refill
> 	__alloc_pages
> 	cache_alloc_refill
> 	kmem_cache_alloc
> 	dst_alloc
> 	ip_route_input
> 	ip_rcv
> 	netif_receive_skb
> 	rtl8139_poll
> 	net_rx_action
> 	__do_softirq
> 	do_softirq
> 	irq_exit
> 	do_IRQ
> 	common_interrupt
> 	handle_mm_fault
> 	do_page_fault
> 	error_core
> 
> much more loaded x86_64 box near also running 2.6.23-rc1 with debugging
> turned on, using atl1 driver doesn't experience any crashes.
> 
> And I found 2.6.22-b91cba52e9b7b3f1c0037908a192d93a869ca9e5-x entry on
> top of grub config which means b91cba52e9b7b3f1c0037908a192d93a869ca9e5
> _without_ any debugging was OK.

I worked out that the crash I saw was in

        BUG_ON(!pte_none(*(kmap_pte-idx)));

in the read of kmap_pte[idx].  Which would be weird as the caller is using 
a literal KM_USER0.

So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
unable to reproduce it now).

If that BUG_ON _is_ triggering then it might indicate that someone is doing
a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.

If they're holding an atomic kmap then they'll be running in_atomic so it
is unlikely that they accidentally added __GFP_WAIT because lots of people
would be getting lots of might_sleep() warnings.

Hence that first VM_BUG_ON in prep_zero_page() _should_ be triggering.

Do you have CONFIG_DEBUG_VM enabled?



Also, it might be useful to apply -mm's kmap_atomic-debugging.patch.  it
will detect lots of abuse.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/