[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <238c0da5-c542-55b1-aaed-d4345516b84b@zhaoxin.com>
Date: Wed, 8 Apr 2020 15:20:46 +0800
From: CodyYao-oc <CodyYao-oc@...oxin.com>
To: Peter Zijlstra <peterz@...radead.org>
CC: <mingo@...hat.com>, <acme@...nel.org>, <mark.rutland@....com>,
<alexander.shishkin@...ux.intel.com>, <jolsa@...hat.com>,
<namhyung@...nel.org>, <tglx@...utronix.de>, <bp@...en8.de>,
<hpa@...or.com>, <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<cooperyan@...oxin.com>
Subject: Re: [PATCH] x86/perf: Add hardware performance events support for
Zhaoxin CPU.
On 2020/3/31 下午6:18, Peter Zijlstra wrote:
> On Tue, Mar 31, 2020 at 05:39:59PM +0800, CodyYao-oc wrote:
>> Zhaoxin CPU has provided facilities for monitoring performance
>> via PMU(Performance Monitor Unit), but the functionality is unused so far.
>> Therefore, add support for zhaoxin pmu to make performance related
>> hardware events available.
>
> This looks like an Intel Architectural PMU v2 or so, is that correct? Do
> you have a link to documentation for your CPU?
>
I'm sorry for such a late reply. Yes, the usage method is the same as
Intel PMU v2, but there is no online document at present. For your
convenience, provide some event descriptions as follows:
-----------------------------------------------------------------------------------------------------------------------------------
Event | Event | Umask | Description
| Select | |
-----------------------------------------------------------------------------------------------------------------------------------
cpu-cycles | 82h | 00h | unhalt core clock
instructions | 00h | 00h | number of instructions at
retirement.
cache-references | 15h | 05h | number of fillq pushs at
the current cycle.
cache-misses | 1ah | 05h | number of l2 miss pushed
by fillq.
branch-instructions | 28h | 00h | counts the number of
branch instructions retired.
branch-misses | 29h | 00h | mispredicted branch
instructions at retirement.
bus-cycles | 83h | 00h | unhalt bus clock
stalled-cycles-frontend | 01h | 01h | Increments each cycle the
# of Uops issued by the RAT to RS.
stalled-cycles-backend | 0fh | 04h | RS0/1/2/3/45 empty
L1-dcache-loads | 68h | 05h | number of retire/commit load.
L1-dcache-load-misses | 4bh | 05h | retired load uops whose
data source followed an L1 miss.
L1-dcache-stores | 69h | 06h | number of retire/commit
Store,no LEA
L1-dcache-store-misses | 62h | 05h | cache lines in M state
evicted out of L1D due to Snoop HitM or dirty line replacement.
L1-icache-loads | 00h | 03h | number of l1i cache access
for valid normal fetch,including un-cacheable access.
L1-icache-load-misses | 01h | 03h | number of l1i cache miss
for valid normal fetch,including un-cacheable miss.
L1-icache-prefetches | 0ah | 03h | number of prefetch.
L1-icache-prefetch-misses | 0bh | 03h | number of prefetch miss.
dTLB-loads | 68h | 05h | number of retire/commit load
dTLB-load-misses | 2ch | 05h | number of load operations
miss all level tlbs and cause a tablewalk.
dTLB-stores | 69h | 06h | number of retire/commit
Store,no LEA
dTLB-store-misses | 30h | 05h | number of store operations
miss all level tlbs and cause a tablewalk.
dTLB-prefetches | 64h | 05h | number of hardware pte
prefetch requests dispatched out of the prefetch FIFO.
dTLB-prefetch-misses | 65h | 05h | number of hardware pte
prefetch requests miss the l1d data cache.
iTLB-load | 00h | 00h | actually counter instructions.
iTLB-load-misses | 34h | 05h | number of code operations
miss all level tlbs and cause a tablewalk.
-----------------------------------------------------------------------------------------------------------------------------------
>> +static void zhaoxin_pmu_disable_all(void)
>> +{
>> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
>> +}
>> +
>> +static void zhaoxin_pmu_enable_all(int added)
>> +{
>> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
>> +}
>> +
>> +static inline u64 zhaoxin_pmu_get_status(void)
>> +{
>> + u64 status;
>> +
>> + rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, status);
>> +
>> + return status;
>> +}
>> +
>> +static inline void zhaoxin_pmu_ack_status(u64 ack)
>> +{
>> + wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, ack);
>> +}
>
>> +static int zhaoxin_pmu_handle_irq(struct pt_regs *regs)
>> +{
>> + struct perf_sample_data data;
>> + struct cpu_hw_events *cpuc;
>> + int bit;
>> + u64 status;
>> + bool is_zxc;
>> + int handled = 0;
>> +
>> + cpuc = this_cpu_ptr(&cpu_hw_events);
>> + apic_write(APIC_LVTPC, APIC_DM_NMI);
>> + zhaoxin_pmu_disable_all();
>> + status = zhaoxin_pmu_get_status();
>> + if (!status)
>> + goto done;
>> +
>> + if (boot_cpu_data.x86 == 0x06 &&
>> + (boot_cpu_data.x86_model == 0x0f ||
>> + boot_cpu_data.x86_model == 0x19))
>> + is_zxc = true;
>> +again:
>> +
>> + /*Clearing status works only if the global control is enable on zxc.*/
>> + if (is_zxc)
>> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
>> +
>> + zhaoxin_pmu_ack_status(status);
>> +
>> + if (is_zxc)
>> + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
>
> That's an unfortunate errata; perhaps write it like so:
>
Thank you very much for your advice, I have changed the code.
> static inline void zxc_pmu_ack_status(u64 status)
> {
> /*
> * ZXC needs global control enabled in order to clear status bits.
> */
> zhaoxin_pmu_enable_all(0);
> zhaoxin_pmu_ack_status(status);
> zhaoxin_pmu_disable_all();
> }
>
> if (is_zxc)
> zxc_pmu_ack_status(status);
> else
> zhaoxin_pmu_ack_status(status);
>
> Alternatively; you can do a whole zxc specific handle_irq() and move the
> get/ack status before disable_all(). If you do that, then factor this:
>
>> + /*
>> + * CondChgd bit 63 doesn't mean any overflow status. Ignore
>> + * and clear the bit.
>> + */
>> + if (__test_and_clear_bit(63, (unsigned long *)&status)) {
>> + if (!status)
>> + goto done;
>> + }
>> +
>> + for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
>> + struct perf_event *event = cpuc->events[bit];
>> +
>> + handled++;
>> +
>> + if (!test_bit(bit, cpuc->active_mask))
>> + continue;
>> +
>> + x86_perf_event_update(event);
>> + perf_sample_data_init(&data, 0, event->hw.last_period);
>> +
>> + if (!x86_perf_event_set_period(event))
>> + continue;
>> +
>> + if (perf_event_overflow(event, &data, regs))
>> + x86_pmu_stop(event, 0);
>> + }
>
> bit into it's own function so you don't have to duplicate it. Then the
> two handle_irq() functions only differ in the status handling.
>
>> +
>> + /*
>> + * Repeat if there is more work to be done:
>> + */
>> + status = zhaoxin_pmu_get_status();
>> + if (status)
>> + goto again;
>> +
>> +done:
>> + zhaoxin_pmu_enable_all(0);
>> + return handled;
>> +}
> .
>
Powered by blists - more mailing lists