[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <66B551EF-8CC6-4145-9618-8DC4F4498138@amacapital.net>
Date: Thu, 5 Dec 2019 09:51:39 -0800
From: Andy Lutomirski <luto@...capital.net>
To: David Laight <David.Laight@...lab.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Running an Ivy Bridge cpu at fixed frequency
> On Dec 5, 2019, at 7:54 AM, David Laight <David.Laight@...lab.com> wrote:
>
> From: Peter Zijlstra
>> Sent: 05 December 2019 09:46
>> As Andy already wrote, perf is really good for this.
>> Find attached, it probably is less shiny than what Andy handed you, but
>> contains all the bits required to frob something.
>
> You are in a maze of incomplete documentation all disjoint.
I don’t see any documentation. Maybe you shouldn’t have turned your flashlight on.
>
> The x86 instruction set doc (eg 325462.pdf) defines the rdpmc instruction, tells you
> how many counters each cpu type has, but doesn't even contain a reference
> to how they are incremented.
> I guess there are some processor-specific MSR for that.
>
> perf_event_open(2) tells you a few things, but doesn't actually what anything is.
> It contains all but the last 'if' clause of this function, without really saying
> what any of it does - or why you might do it this way.
>
> static inline u64 mmap_read_self(void *addr)
> {
> struct perf_event_mmap_page *pc = addr;
> u32 seq, idx, time_mult = 0, time_shift = 0, width = 0;
> u64 count, cyc = 0, time_offset = 0, enabled, running, delta;
> s64 pmc = 0;
>
> do {
> seq = pc->lock;
> barrier();
>
> enabled = pc->time_enabled;
> running = pc->time_running;
>
> if (pc->cap_user_time && enabled != running) {
> cyc = rdtsc();
> time_mult = pc->time_mult;
> time_shift = pc->time_shift;
> time_offset = pc->time_offset;
> }
>
> idx = pc->index;
> count = pc->offset;
> if (pc->cap_user_rdpmc && idx) {
> width = pc->pmc_width;
> pmc = rdpmc(idx - 1);
> }
>
> barrier();
> } while (pc->lock != seq);
>
> if (idx) {
> pmc <<= 64 - width;
> pmc >>= 64 - width; /* shift right signed */
> count += pmc;
> }
>
> if (enabled != running) {
> u64 quot, rem;
>
> quot = (cyc >> time_shift);
> rem = cyc & ((1 << time_shift) - 1);
> delta = time_offset + quot * time_mult +
> ((rem * time_mult) >> time_shift);
>
> enabled += delta;
> if (idx)
> running += delta;
>
> quot = count / running;
> rem = count % running;
> count = quot * enabled + (rem * enabled) / running;
> }
>
> return count;
> }
>
> AFAICT:
> 1) The last clause is scaling the count up to allow for time when the hardware counter
> couldn't be allocated.
> I'm not convinced that is useful, better to ignore the entire measurement.
> Half this got deleted from the man page, leaving strange 'set but unused' variables.
>
> 2) The hardware counters are disabled while the process is asleep.
> On wake a different pmc counter might be used (maybe on a different cpu).
> The new cpu might not even have a counter available.
>
> 3) If you don't want to scale up for missing periods it is probably enough to do:
> do {
> seq = pc->offset;
> barrier();
> idx = pc->index;
> if (!index)
> return -1;
> count = pc->offset + rdpmc(idx - 1);
> } while (seq != pc->seq);
> return (unsigned int)count;
>
> Not tried it yet :-)
Use my version :). I just throw out the sample if we were preempted or if it was otherwise suspicious.
—Andy
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Powered by blists - more mailing lists