linux-kernel - Re: Strange minor page fault repeats when SPECjbb2005 is executed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4D6FC6C7.8060001@redhat.com>
Date:	Thu, 03 Mar 2011 11:50:15 -0500
From:	Rik van Riel <riel@...hat.com>
To:	Yasunori Goto <y-goto@...fujitsu.com>
CC:	Linux Kernel ML <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hiroyuki KAMEZAWA <kamezawa.hiroyu@...fujitsu.com>,
	Motohiro Kosaki <kosaki.motohiro@...fujitsu.com>
Subject: Re: Strange minor page fault repeats when SPECjbb2005 is executed

On 03/03/2011 06:01 AM, Yasunori Goto wrote:

> In this log, cpu4 and 6 repeat page faults.
> ----
> handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
> handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
> handle_mm_fault jiffies64=4295160616 cpu=6 address=40003a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=551ef067
> handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067
> handle_mm_fault jiffies64=4295160616 cpu=4 address=40019a38 pmdval=0000000070832067 ptehigh=00000000 ptelow=55171067

> I confirmed this phenomenon is reproduced on 2.6.31 and 2.6.38-rc5
> of x86 kernel, and I heard this phenomenon doesn't occur on
> x86-64 kernel from another engineer who found this problem first.
>
> In addition, this phenomenon occurred on 4 boxes, so I think the cause
> is not hardware malfunction.

On what CPU model(s) does this happen?

Obviously the PTE is present and allows read, write and
execute accesses, so the PTE should not cause any faults.

That leaves the TLB. It looks almost like the CPU keeps
re-faulting on a (old?) TLB entry, possibly with wrong
permissions, and does not re-load it from the PTE.

I know this "should not happen" on x86, but I cannot think
of an alternative explanation right now.  Can you try flushing
the TLB entry in question from handle_pte_fault?

It looks like the code already does this for write faults, but
maybe the garbage collection code uses PROT_NONE a lot and is
running into this issue with a read or exec fault?

It would be good to print the fault flags as well in your debug
print, so we know what kind of fault is being repeated...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/