lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 16 Apr 2007 15:00:50 -0700
From:	Zachary Amsden <zach@...are.com>
To:	Hugh Dickins <hugh@...itas.com>
CC:	David Rientjes <rientjes@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

Hugh Dickins wrote:
> You're right to want to defer your pte_updates, David is right to want
> to batch his TLB flushes.  It bothers me that you have a surprising case,
> and that unless you abandon your optimization, it imposes a new constraint
> on how to proceed in common code (without #ifdef'ing around).
>
> But perhaps in this case David might concede that the longer we delay
> the TLB flush, the more likely a referenced bit is to be missed - that is,
> it gets cleared from the pte, but if that page is accessed again before
> the TLB is flushed, the processor may well omit to reinstate the accessed
> bit, and our stats drift away from reality.
>
> Compromise patch below: would that be satisfactory to you, David?
>   

Although I appreciate the heroics, you needn't do this on our account; 
the win of a couple thousand cycles is not worth the cost in complexity, 
IMHO, and the penalty on native quite potentially overshadows this.  If 
you still issue the flush inside the spinlock, as required for this 
paravirt optimization, you are taking the risk of holding the spinlock 
an extra long time while issuing a TLB shootdown - which means waiting 
for an IPI.

It might not matter that much on i386, but on big iron (or realtime) 
systems, this could have significant negative scaling effects for 
workloads where the page page table was hot on some set of CPUs (say, 
remapping file pages for database access).  In time, the benefits of 
this optimization to the hypervisor will decrease, while the benefits of 
optimizing the other way for shorter spinlock time may increase, both in 
a VM and on native hardware.

So I would rather just drop the pte_update_defer down to a pte_update if 
the flush is not immediately following - as it is nice and simply 
correct without getting in the way.  I see there is more in this thread 
that I haven't read yet, so I preemptively reserve the right to issue an 
invalidation of this opinion...

Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ