linux-kernel - Re: [ltt-dev] [PATCH] Fix dirty page accounting in redirty_page_for

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.10.0905010939090.18324@qirst.com>
Date:	Fri, 1 May 2009 09:44:55 -0400 (EDT)
From:	Christoph Lameter <cl@...ux.com>
To:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
cc:	Nick Piggin <nickpiggin@...oo.com.au>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Yuriy Lalym <ylalym@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	ltt-dev@...ts.casi.polymtl.ca, Tejun Heo <tj@...nel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [ltt-dev] [PATCH] Fix dirty page accounting in
 redirty_page_for_writepage()

On Thu, 30 Apr 2009, Mathieu Desnoyers wrote:

> By ZVC update, you mean Zone ... Counter update ? (which code exactly ?)

The code that you were modifying in vmstat.c.

> Hrm, I must admit I'm not sure I follow how your reasoning applies to my
> code. I am using a percpu_add_return_irq() exactly for this reason : it
> only ever touches the percpu variable once and atomically. The test for
> overflow is done on the value returned by percpu_add_return_irq().

If the percpu differential goes over a certain boundary then the
differential would be updated twice.

> Therefore, an interrupt scenario that would be close to what I
> understand from your concerns would be :
>
> * Thread A
>
> inc_zone_page_state()
>   p_ret = percpu_add_return(p, 1); (let's suppose this increment
>                                     overflows the threshold, therefore
>                                     (p_ret & mask) == 0)
>
> ----> interrupt comes in, preempts the current thread, execution in a
>       different thread context (* Thread B) :
>
>      inc_zone_page_state()
>        p_ret = percpu_add_return(p, 1);  ((p_ret & mask) == 1)
>        if (!(p_ret & mask))
>          increment global zone count. (not executed)
>
> ----> interrupt comes in, preempts the current thread, execution back to
>       the original thread context (Thread A), on the same or on a
>       different CPU :
>
>   if (!(p_ret & mask))
>     increment global zone count.   -----> will therefore increment the
>                                           global zone count only after
>                                           scheduling back the original
>                                           thread.
>
> So I guess what you say here is that if Thread B is preempted for too
> long, we will have to wait until it gets scheduled back before the
> global count is incremented. Do we really need such degree of precision
> for those counters ?
>
> (I fear I'm not understanding your concern fully though)

Inc_zone_page_state modifies the differential which is u8 and can easily
overflow.

Hmmm. But if you check for overflow to zero this way it may work without
the need for cmpxchg. But if you rely on overflow then we only update the
global count after 256 counts on the percpu differential. The tuning of
the accuracy of the counter wont work anymore. The global counter could
become wildly inaccurate with a lot of processors.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/