linux-kernel - Re: Top kernel oopses/warnings for the week of May 30th 2008

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.1.10.0805301403540.3141@woody.linux-foundation.org>
Date:	Fri, 30 May 2008 14:43:32 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Hugh Dickins <hugh@...itas.com>
cc:	Arjan van de Ven <arjan@...ux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>, Greg KH <greg@...ah.com>,
	Jeff Garzik <jeff@...zik.org>
Subject: Re: Top kernel oopses/warnings for the week of May 30th 2008

On Fri, 30 May 2008, Hugh Dickins wrote:
>
> On Fri, 30 May 2008, Arjan van de Ven wrote:
> > 
> > Rank 7: set_page_address (oops)
> > 	Reported 53 times (65 total reports)
> > 	crash coming from flush_all_zero_pkmaps; was this fixed by Hugh the
> > 	other day?
> 
> No, not at all.  But I'll have a little ponder over it.

It's a BUG_ON(), but sadly the oops gatherer doesn't seem to gather that 
part. You can see it from the code portion: the "<0f> 0b" gives it away 
(that's the ud2 opcode).

There's two BUG_ON()'s in that function, and I think it's the second one, 
based on at least the code generation that my particular compiler version 
gets. IOW, it would be the

	BUG_ON(list_empty(&page_address_pool));

thing.

Why would we run out of the page-address pool? Or perhaps the right 
question is what actually protects us from _not_ running out? 

We seem to depend on the page_address_pool always being in sync with the 
pkmap_count[] array, but the fact is, they are not protected by the same 
locks. The array is protected by kmap_lock, and the page_address_pool is 
protected by the "pool_lock".

And even if they were to nest properly (I don't think they do), we 
actually do the list_empty(&page_address_pool) outside the pool lock, 
so...

I dunno. That code is really messy. Why does it have two locks for the 
data structures when it then seems to absolutely require that they are 
always coherent? And if we want to have separate locks, we cannot require 
that they are in lock-step, perhaps we should have more pages in the 
page_address_pool than strictly required since they may not be 1:1?

I do hate that mm/highmem.c mess, but I also wonder what made it start to 
trigger if it's a bug there. That code hasn't changed in ages, afaik.

I don't think this is Hugh's fault, but on the other hand I think it would 
be great if Hugh looked at it. I think most of that code predates even the 
BK repo - because I'm not finding any history for it even in the 
historical archives. Who dares look at it?

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/