[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUuM0B2Eo6OD6TJUE+XyEgsWJatyCYNSYBFuniJ5ww8KA@mail.gmail.com>
Date: Tue, 9 Jun 2015 10:55:15 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
John Stultz <john.stultz@...aro.org>,
Aaron Lu <aaron.lu@...el.com>, Tony Li <tony.li@....com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Suravee Suthikulanit <suravee.suthikulpanit@....com>,
Frédéric Weisbecker <fweisbec@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Borislav Petkov <bp@...e.de>,
Fengguang Wu <fengguang.wu@...el.com>,
Len Brown <lenb@...nel.org>, Ken Xue <ken.xue@....com>,
Huang Rui <ray.huang@....com>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH v2 3/4] x86, mwaitt: introduce mwaix delay with a
configurable timer
On Tue, Jun 9, 2015 at 10:13 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, 2015-06-09 at 09:46 -0700, Andy Lutomirski wrote:
>> On Jun 9, 2015 2:30 AM, "Peter Zijlstra" <peterz@...radead.org> wrote:
>
>> > How about you think instead and do something like:
>> >
>> > rdtsc(start);
>> > rdtsc_barrier();
>>
>> Other way around. We really need a function static inline u64
>> rdtsc_with_barrier().
>
> So admittedly I have not actually looked at how the tsc barrier stuff
> works, but what? We don't care if the rdtsc goes up, we just want to
> make sure its done before going further.
>
When I looked at the rdtsc ordering a couple years ago, I thought
about what it meant for rdtsc to be properly ordered. I decided that
proper rdtsc ordering meant that no one should ever be able to tell if
rdtsc ends up reordered. Concretely, I think that rdtsc should be
ordered like an x86 load from a shared memory location. The manuals
are vague but, after a decent amount of experimentation,
rdtsc_barrier(); rdtsc() seems to achieve that on all CPUs. With the
barrier, the rdtsc won't happen before a prior load in the same
thread, and no CPU seems to ever execute rdtsc after a subsequent
memory access.
For delay in particular, we care about I/O delays as well, so
presumably we want to guarantee that a prior load or store to be
visible for at least the requested amount of time before the next I/O.
This should be fine here, too, as MMIO stores aren't ordered anyway
(drivers need a dummy load afterwards) and I bet that the in and out
instructions are already ordered strongly enough.
I'll send a patch to improve the headers.
--Andy
>>
>> >
>> > for (;;) {
>> > delay = min(MWAIT_MAX_LOOPS, loops);
>> >
>> > __monitorx(&addr, 0, 0);
>> > mwaitx(delay, true);
>>
>> I don't like this hack. The compiler is entirely within is rights to
>> poke addr's cacheline (i.e. the stack) between the two instructions.
>> I'd suggest either making the thing a full cacheline long or using a
>> single asm statement.
>>
>> Also, "addr" is a bad name for a dummy variable that isn't an address
>> at all. How about "dummy"?
>
> Sure, and I like your question on why monitorx exists at all. But none
> of that was the point here, the main point being that if loops was too
> big, we should do multiple mwaitx invocations, not punt and busy loop.
>
>> > rdtsc_barrier();
>> > rdtsc(end);
>> > rdtsc_barrier();
>>
>> The second barrier is unnecessary.
>
> By virtue of the address dependency?
No, it's just that CPUs seem to work this way.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists