lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 19 May 2012 13:45:57 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Sam Portolla <samportolla@...oo.com>
cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"aarcange@...hat.com" <aarcange@...hat.com>
Subject: Re: exit_mmap BUG_ON in 2.6.23

On Fri, 18 May 2012, Sam Portolla wrote:
> [please cc samPortolla@...oo.com on your replies, not subscribed to the linux-kernel mailer]
> 
> Hi, I have read the thread on same issue in 3.1:
> but this is happening on earlier GNU linux version 2.6.23 for x86_64,
> which does not have THP (I believe), nor it has huge_memory.c.
> Is there a fix one of you experts could supply?  Issue is not reproducible
> so far, but happened on a customer site. Some info below.
> 
> kernel BUG at .../bfc/linux/kernel-2.6.x/mm/mmap.c:2049!
> 
> Line 2049 is in exit_mmap():
> 
> BUG_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);
> 
>  RIP: 0010:[<ffffffff80277840>]  [<ffffffff80277840>] exit_mmap+0xf0/0x100 
> [snip]
>  Call Trace:
>  [<ffffffff8022ee14>] mmput+0x44/0xd0
>  [<ffffffff802340a1>] exit_mm+0x91/0x100
>  [<ffffffff802347ea>] do_exit+0x17a/0x960
>  [<ffffffff8023c4bc>] __dequeue_signal+0xec/0x1b0
>  [<ffffffff80235048>] do_group_exit+0x38/0x90
>  [<ffffffff8023e3c6>] get_signal_to_deliver+0x2d6/0x4b0
>  [<ffffffff8020b69a>] do_notify_resume+0xaa/0x760
>  [<ffffffff8020c818>] retint_signal+0x3d/0x85

I've checked back through old ChangeLogs, and (apart from a UserModeLinux
case) I don't see any fix for a BUG_ON(nr_ptes) issue in between 2.6.19
and the much later THP issue, which you're right to think cannot be yours.

But the 2.6.19 case, and one which a video driver writer had more recently,
were both caused by unrelated code zeroing beyond what it had allocated:
happening to zero part of a higher-level page table, making it impossible
for task exit to locate all the page tables (and pages) it had to free.

Though I can't be sure, these BUG_ON(nr_ptes) reports do seem perhaps
too infrequent to be caused by bad logic in mm itself: I suspect memory
corruption in your case too.

There's no clue here as to what the cause might be, I'm afraid.
Rebuilding your kernel with CONFIG_DEBUG_PAGEALLOC=y, and slab debugging
on, might shed more light: but that's probably not something you want to
get into on a customer site, for a problem only seen once or twice.

The best I can suggest is for you to change that BUG_ON to a WARN_ON,
so at least the kernel doesn't crash there, and you might gather more
information from each time it happens; but you'll probably leak pages,
and may very well crash soon for other reasons (e.g. when evicting an
inode cannot locate all the maps of its pages).

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ