lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0805161110210.565@blonde.site>
Date:	Fri, 16 May 2008 12:09:19 +0100 (BST)
From:	Hugh Dickins <hugh@...itas.com>
To:	Randy Johnson <theraptor2005@...il.com>
cc:	akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: 2.6.25.1: Kernel BUG at mm/rmap.c:669, General Protection Faults,
 and generic hard locks

On Mon, 12 May 2008, Randy Johnson wrote:
> Sent this to linux-kernel, then realized I probably should have sent
> this here as well...
> 
> Hi,
> 
> Recently moved from 2.6.22 up to 2.6.25.1 to solve some AHCI issues.
> Following this update, Matlab has caused numerous hard lockups. I've
> gotten lucky twice and been able to remote in and get the logs, which
> follow below. System is an AM2 with 6G ram installed, but booted with
> mem=3200M to circumvent some IOMMU issues. It is possible to

I expect your "mem=3200M" is just fine, I'm fond of "mem=" myself;
but be aware that you can get into trouble with it, and I've heard
"memmap=" recommended instead.  If you're unfamiliar with that,
try Documentation/kernel-parameters.txt or googling.

> eventually replicate the issue, but not with a specific sequence of
> activities that I've found. General activity from Matlab when it
> occurs is heavy disk IO (reading, no writting), and large memory
> consumption. Latest version of memtest86+ was run overnight and shows
> no issues.

memtest86+ overnight was certainly the right thing to try;
but I'm not convinced by its success.  Maybe there's a pattern
in Matlab which is tickling a bad RAM issue more effectively
than memtest does (sometimes gcc hits problems which memtest
hasn't shown).  And since (sadly!) you have plenty of memory
to spare, it'd be well worth switching boards around: your
lowest bank does look suspect (and I'm guessing 2.6.25.1 just
places things differently from 2.6.22, some important data now
being placed on bad RAM where something unused went before).

I could perfectly well be wrong about all that: maybe you do have
a kernel bug corrupting your memory; but I've no idea where if so.

> 
> Any thoughts?
> 
> -Randy Johnson
> 
> 
> log #1
> 
> Eeek! page_mapcount(page) went negative! (-1946157056)

That's the most interesting line of it: page_mapcount(page) isn't
off-by one or something like that, instead its high byte has been
corrupted at some point from 0x00 to 0x8c.

(Unfortunately, what with all the printk'ing that's gone on, I'm not
at all confident whether or where the address of the page in question
is in the registers or stack displayed: the messages suit tracking
a relevant kernel bug rather than a random corruption.)

> 
> And log #2
> 
> general protection fault: 0000 [1] SMP
> CPU 1
> Modules linked in: af_packet aic7xxx fan button thermal processor unix
> Pid: 6232, comm: MATLAB Not tainted 2.6.25.1 #1
> RIP: 0010:[<ffffffff802652e3>]  [<ffffffff802652e3>]
> get_page_from_freelist+0x303/0x670
> RSP: 0000:ffff8100b2421d78  EFLAGS: 00010002
> RAX: ffff8100bf64bb10 RBX: ffff8100bf64bb10 RCX: ffffe200029538d8
> RDX: 7fffe200004bee10 RSI: 0000000000000000 RDI: 000000000000001d
       ^
There it's doing the list_del(&page->lru) in buffered_rmqueue(),
and hitting a corrupted prev pointer: the top bit of the address has
been cleared, causing that and subsequent general protection faults
(same list pointer RCX and prev contents RDX each time).

But I'm afraid that tells me nothing about the cause of these
corruptions.  If you've gathered more crash logs during the week,
please do post the logs or send them to me privately, I'll try
to decipher what I can - but that may not help you much.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ