linux-kernel - Re: [RFC PATCH 16/22 -v2] add get_monotonic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.58.0801161041010.15998@gandalf.stny.rr.com>
Date:	Wed, 16 Jan 2008 10:58:43 -0500 (EST)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
cc:	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Christoph Hellwig <hch@...radead.org>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	Tim Bird <tim.bird@...sony.com>,
	Sam Ravnborg <sam@...nborg.org>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Steven Rostedt <srostedt@...hat.com>,
	Paul Mackerras <paulus@...ba.org>,
	Daniel Walker <dwalker@...sta.com>
Subject: Re: [RFC PATCH 16/22 -v2] add get_monotonic_cycles

On Wed, 16 Jan 2008, Mathieu Desnoyers wrote:
> > No, there's probably issues there too, but no need to worry about it,
> > since I already showed that allowing for clocksource_accumulate to happen
> > inside the get_monotonic_cycles loop is already flawed.
> >
>
> Yep, I just re-read through your previous email, and totally agree that
> the algorithm is flawed in the way you pointed out.

Yeah, but if we replace the loop with a seq lock, then it would work.
albeit, more cacheline bouncing (caused by writes). (maybe not, see below)

> > do you actually use the RCU internals? or do you just reimplement an RCU
> > algorithm?
> >
>
> Nope, I don't use RCU internals in this code. Preempt disable seemed
> like the best way to handle this utterly short code path and I wanted
> the write side to be fast enough to be called periodically. What I do is:

grmble. Then how do you trace preempt_disable?  As my tracer does that
(see the last patch in the series).

>
> - Disable preemption at the read-side :
>   it makes sure the pointer I get will point to a data structure that
>   will never change while I am in the preempt disabled code. (see *)
> - I use per-cpu data to allow the read-side to be as fast as possible
>   (only need to disable preemption, does not race against other CPUs and
>   won't generate cache line bouncing). It also allows dealing with
>   unsynchronized TSCs if needed.
> - Periodical write side : it's called from an IPI running on each CPU.
>
> (*) We expect the read-side (preempt off region) to last shorter than
> the interval between IPI updates so we can guarantee the data structure
> it uses won't be modified underneath it. Since the IPI update is
> launched each seconds or so (depends on the frequency of the counter we
> are trying to extend), it's more than ok.

One thing I want to clear up. The major difference between this
latency_tracer and LTTng is what we consider fast paths.  The latency
tracer is recording things like enabling and disabling interrupts, preempt
count changes, or simply profiling all function calls.  Those are what I
consider fast paths.  The slow path WRT the latency_tracer are things like
context switches.  This is why I don't have a problem with copying the
comm at context switch time. Because that _is_ a slow path for the latency
tracer.

Placing a read_seqlock in get_monotonic_cycles would not be that bad,
since the only slow down would be the rmb.  read_seqlocks don't modify
global data. Only the write_seqlock does. So the cache line bouncing would
only happen on updates in clocksource_accumulate. But then after the
caches are all balanced again, the reads will continue fine.

Question: Is a cache-miss a greater cost than a read to a clocksource
          (besides the TSC)?

Also note how I arrange these variables in the clock struct:

	struct {
		cycle_t cycle_last, cycle_accumulated, cycle_monotonic;
		cycle_t cycle_interval;
	} ____cacheline_aligned_in_smp;

I could do the following:

	struct {
		seqlock_t cycle_lock;
		cycle_t cycle_last, cycle_accumulated, cycle_monotonic;
		cycle_t cycle_interval;
	} ____cacheline_aligned_in_smp;

Which would help to keep all these in the same cache line. These are all
updated at the same time, and hopefully this will keep the cache line
bouncing limited to a single cacheline.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/