[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4623F212.5030904@vmware.com>
Date: Mon, 16 Apr 2007 15:00:50 -0700
From: Zachary Amsden <zach@...are.com>
To: Hugh Dickins <hugh@...itas.com>
CC: David Rientjes <rientjes@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
Hugh Dickins wrote:
> You're right to want to defer your pte_updates, David is right to want
> to batch his TLB flushes. It bothers me that you have a surprising case,
> and that unless you abandon your optimization, it imposes a new constraint
> on how to proceed in common code (without #ifdef'ing around).
>
> But perhaps in this case David might concede that the longer we delay
> the TLB flush, the more likely a referenced bit is to be missed - that is,
> it gets cleared from the pte, but if that page is accessed again before
> the TLB is flushed, the processor may well omit to reinstate the accessed
> bit, and our stats drift away from reality.
>
> Compromise patch below: would that be satisfactory to you, David?
>
Although I appreciate the heroics, you needn't do this on our account;
the win of a couple thousand cycles is not worth the cost in complexity,
IMHO, and the penalty on native quite potentially overshadows this. If
you still issue the flush inside the spinlock, as required for this
paravirt optimization, you are taking the risk of holding the spinlock
an extra long time while issuing a TLB shootdown - which means waiting
for an IPI.
It might not matter that much on i386, but on big iron (or realtime)
systems, this could have significant negative scaling effects for
workloads where the page page table was hot on some set of CPUs (say,
remapping file pages for database access). In time, the benefits of
this optimization to the hypervisor will decrease, while the benefits of
optimizing the other way for shorter spinlock time may increase, both in
a VM and on native hardware.
So I would rather just drop the pte_update_defer down to a pte_update if
the flush is not immediately following - as it is nice and simply
correct without getting in the way. I see there is more in this thread
that I haven't read yet, so I preemptively reserve the right to issue an
invalidation of this opinion...
Thanks,
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists