lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 26 May 2012 11:06:47 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Sam Portolla <samportolla@...oo.com>
cc:	Eric Dumazet <eric.dumazet@...il.com>,
	"kaber@...sh.net" <kaber@...sh.net>,
	"jarkao2@...il.com" <jarkao2@...il.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: exit_mmap BUG_ON in 2.6.23 (and Add qdisc __NET_XMIT_STOLEN)

On Fri, 25 May 2012, Sam Portolla wrote:
> 
> commit 378a2f090f7a478704a372a4869b8a9ac206234e
> Date:   Mon Aug 4 22:31:03 2008 -0700
> net_sched: Add qdisc __NET_XMIT_STOLEN flag
...
> 
>  I wonder if the lack of above patch in our code base could explain the
>  exit_mmap() BUG_ON as well due to memory corruption causing MMU to not
>  be able to locate the page(s) it had to free. NR_PTES keeps track of
>  that? Could you explain that more?

I concur with Eric in thinking it unlikely - though (unlike Eric)
I know far too little about networking to comment with authority.

I'd guess that there have been literally hundreds of fixes gone into
the kernel since 2.6.23, each more likely to be the fix to such memory
corruption than this one.  And I could also be wrong in attributing
your BUG to memory corruption: perhaps I'm forgetting an mm fix.

You ask me to explain more: mm->nr_ptes keeps track of the number of
page tables that have been allocated; when we free the mm, we should
be freeing exactly the number of page tables we allocated earlier,
but a bug in the code maintaining the vmas or the page tables might
break that, hence the BUG_ON to test.  But equally, if there has been
memory corruption of vmas or of higher-level page tables, we may now
be unable to locate all the page tables we allocated earlier, and so
hit the BUG_ON for that reason.

Would I be unfair to characterize this as a problem seen once at a
customer site in the 4.5 years since 2.6.23 was released?

As I said before, please just change that BUG_ON to WARN_ON, and
wait to see if more such issues come up: if they do, then you can
start to look for a pattern.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ