linux-kernel - Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4623F212.5030904@vmware.com>
Date:	Mon, 16 Apr 2007 15:00:50 -0700
From:	Zachary Amsden <zach@...are.com>
To:	Hugh Dickins <hugh@...itas.com>
CC:	David Rientjes <rientjes@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

Hugh Dickins wrote:
> You're right to want to defer your pte_updates, David is right to want
> to batch his TLB flushes.  It bothers me that you have a surprising case,
> and that unless you abandon your optimization, it imposes a new constraint
> on how to proceed in common code (without #ifdef'ing around).
>
> But perhaps in this case David might concede that the longer we delay
> the TLB flush, the more likely a referenced bit is to be missed - that is,
> it gets cleared from the pte, but if that page is accessed again before
> the TLB is flushed, the processor may well omit to reinstate the accessed
> bit, and our stats drift away from reality.
>
> Compromise patch below: would that be satisfactory to you, David?
>   

Although I appreciate the heroics, you needn't do this on our account; 
the win of a couple thousand cycles is not worth the cost in complexity, 
IMHO, and the penalty on native quite potentially overshadows this.  If 
you still issue the flush inside the spinlock, as required for this 
paravirt optimization, you are taking the risk of holding the spinlock 
an extra long time while issuing a TLB shootdown - which means waiting 
for an IPI.

It might not matter that much on i386, but on big iron (or realtime) 
systems, this could have significant negative scaling effects for 
workloads where the page page table was hot on some set of CPUs (say, 
remapping file pages for database access).  In time, the benefits of 
this optimization to the hypervisor will decrease, while the benefits of 
optimizing the other way for shorter spinlock time may increase, both in 
a VM and on native hardware.

So I would rather just drop the pte_update_defer down to a pte_update if 
the flush is not immediately following - as it is nice and simply 
correct without getting in the way.  I see there is more in this thread 
that I haven't read yet, so I preemptively reserve the right to issue an 
invalidation of this opinion...

Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/