lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <519E095A.4000105@redhat.com>
Date:	Thu, 23 May 2013 08:19:38 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Stanislav Meduna <stano@...una.org>
CC:	"H. Peter Anvin" <hpa@...or.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	the arch/x86 maintainers <x86@...nel.org>,
	Hai Huang <hhuang@...hat.com>
Subject: Re: [PATCH] mm: fix up a spurious page fault whenever it happens

On 05/23/2013 04:07 AM, Stanislav Meduna wrote:
> On 22.05.2013 20:43, Rik van Riel wrote:
>
>>> Some CPUs have had errata when it comes to flushing large pages that
>>> have been split into small pages by hardware, e.g. due to MTRR
>>> conflicts.  In that case, fragments of the large page may have been left
>>> in the TLB.
>
> Can I somehow find if this is the case? The memory mapping
> for the failing process has two regions slightly larger than
> 4 MB - code and heap.
>
> The process also does not access any funny memory regions
> from userspace - it is basically networking (both TCP/IP
> and raw sockets) and crunching of the data received.
> No mmapped devices or something like that.
>
>> static inline void __native_flush_tlb_single(unsigned long addr)
>> {
>>          __flush_tlb();
>> }
>>
>> This on top of the other two patches.
>
> It did not crash overnight, but it also does not show any
> minor fault counted for the threads, so I'm afraid the situation
> just did not happen - there should be at least one visible in
> the ps -o min_flt output, right?

If all the page faults are done by he main thread,
and the TLB gets properly flushed now, the other
threads might not see minor faults.

> I will give it some more testing time.

That is a good idea.

Now to figure out how we properly fix this
issue in the kernel...

We can add a bit in the architecture bits that
we use to check against other CPU and system
errata, and conditionally flush the whole TLB
from __native_flush_tlb_single().

The question is, how do we identify what CPUs
need the extra flushing?

And in what circumstances do they require it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ