linux-kernel - Re: -mm merge plans for 2.6.20

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45878E8A.6000506@yahoo.com.au>
Date:	Tue, 19 Dec 2006 18:02:34 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Dave Jones <davej@...hat.com>
CC:	Andrew Morton <akpm@...l.org>, linux-kernel@...r.kernel.org,
	Hugh Dickins <hugh@...itas.com>,
	Chris Rankin <cj.rankin@...world.com>
Subject: Re: -mm merge plans for 2.6.20

Dave Jones wrote:
> On Tue, Dec 19, 2006 at 04:20:37PM +1100, Nick Piggin wrote:
>  > Dave Jones wrote:
>  > 
>  > > Eeek! page_mapcount(page) went negative! (-2)
>  > 
>  > Hmm, probably happened once before, too.
> 
> You're right. Going back further in the log, I noticed
> that it had happened again exactly at the time that cron restarted vpnc.
> The first time, the flags were different..
> 
>  Dec  4 00:01:03 firewall kernel: Eeek! page_mapcount(page) went negative! (-1)
>  Dec  4 00:01:03 firewall kernel:   page->flags = 400
>  Dec  4 00:01:03 firewall kernel:   page->count = 1
>  Dec  4 00:01:03 firewall kernel:   page->mapping = 00000000

Still reserved, with a NULL mapping. I'd say it could be the same page.

> 
>  > >   page->flags = 404
>  > 
>  > What's that? PG_referenced|PG_reserved? So I'd say it is likely
>  > that some driver has got its refcounting wrong.
> 
> At the time that it bit me, here's what was loaded..
> 
> tun ipt_MASQUERADE iptable_nat ip_nat ipt_LOG xt_limit ipv6
> ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp
> iptable_filter ip_tables x_tables video sbs i2c_ec button battery asus_acpi ac
> parport_pc lp parport pcspkr ide_cd i2c_viapro i2c_core cdrom 3c59x via_rhine
> via_ircc mii irda crc_ccitt serio_raw dm_snapshot dm_zero dm_mirror dm_mod ext3
> jbd ehci_hcd ohci_hcd uhci_hcd
> 
> The scary ones (i2c, irda) weren't in use at all, and had never been opened afaik,
> so the potential for those to be corrupting memory is slim, but not out of the
> question. (Why the hell asus_acpi is loaded is a mystery, this isn't an Asus,
> or a laptop. Probably dumb initscripts).

OK that could be useful if I do some grepping and see which ones are
setting PG_reserved.

>  > And I see we've got another report for 2.6.19.1 from Chris, which
>  > is equally vague.
> 
> I'll be moving that box to 2.6.19.x at some point real soon, so I'll holler
> if I see it again on a later kernel.
> 
>  > IMO the pattern is much too consistent to be able to attribute
>  > them all to hardware problems. And considering it takes so long
>  > for these things to appear, can we get something like the attached
>  > patch upstream at least until we manage to stamp them out?
> 
> Sounds like a good idea to me.
> 
> ACKed-by: Dave Jones <davej@...hat.com>

Thanks.

> 
>  > Any other debugging info we can add?
> 
> Would it be useful to print the pfn of the page ?
> In cases like mine, where it bit twice before it killed the box, it
> might be interesting to see if its always the same page.  Not sure
> what that would prove/disprove though.

Might help. I guess the site where it is allocated from might be
another one, although I'm hoping that if we know what ->nopage is
being used then we'll be able to track it. OTOH it may be using
remap_pfn_range from fops->mmap, rather than nopage... I wonder
how we could get at that info? vma->vm_file->f_op->mmap?

-- 
SUSE Labs, Novell Inc.

View attachment "mm-rmap-debug-more.patch" of type "text/plain" (4402 bytes)