lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 30 Jan 2018 09:41:40 -0500
From:   "Liang, Kan" <kan.liang@...ux.intel.com>
To:     Stephane Eranian <eranian@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jiri Olsa <jolsa@...hat.com>, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH V3 0/5] bugs fix for large PEBS mmap read and rdpmc read



On 1/30/2018 4:16 AM, Stephane Eranian wrote:
> Hi,
> 
> On Mon, Jan 29, 2018 at 8:29 AM, <kan.liang@...ux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@...ux.intel.com>
>>
>> ------
>>
>> Changes since V2:
>>   - Refined the changelog
>>   - Introduced specific read function for large PEBS.
>>     The previous generic PEBS read function is confusing.
>>     Disabled PMU in pmu::read() path for large PEBS.
>>     Handled the corner case when reload_times == 0.
>>   - Modified the parameter of intel_pmu_save_and_restart_reload()
>>     Discarded local64_cmpxchg
>>   - Added fixes tag
>>   - Added WARN to handle reload_times == 0 || reload_val == 0
>>
>> Changes since V1:
>>   - Check PERF_X86_EVENT_AUTO_RELOAD before call
>>     intel_pmu_save_and_restore()
> 
> It is not yet clear to me why PERF_SAMPLE_PERIOD is not allowed
> with large PEBS. Large PEBS requires fixed period. So the kernel could
> make up the period from the event and store it in the sampling buffer.

The PERF_SAMPLE_PERIOD will be implicitly set in freq mode.
By now, large PEBS doesn't support freq mode yet.

> 
> I tried using large PEBS recently, and despite trying different option
> combination of perf record, I was not able to get it to work.
> 
> $ perf record  -c 1  -e cpu/event=0xd1,umask=0x10,period=19936/pp
> --no-timestamp --no-period -a -C 0
> 
> But I was able to make this work with a much older kernel.

Is there any error message?

> 
> Another annoyance I ran into is with perf record requiring -c period
> in order not to set
> PERF_SAMPLE_PERIOD in the event.
> 
> If I do:
> perf record  -c 1  -e cpu/event=0xd1,umask=0x10,period=19936/pp
> --no-timestamp --no-period -a -C 0
> 
> I get
> 
> perf_event_attr:
>    type                             4
>    size                             112
>    config                           0x10d1
>    { sample_period, sample_freq }   199936
>    sample_type                      IP|TID|CPU
> 
> But if I do:
> perf record  -e cpu/event=0xd1,umask=0x10,period=19936/pp
> --no-timestamp --no-period -a -C 0
> 
> I get
> 
> perf_event_attr:
>    type                             4
>    size                             112
>    config                           0x10d1
>    { sample_period, sample_freq }   199936
>    sample_type                      IP|TID|CPU|PERIOD
> 
> Perf should check if all events have a period=, then it should not
> pass PERF_SAMPLE_PERIOD, even
> more so when only one event is defined.

As my understanding, the "period=" is per-event term. It should not 
change the global setting.
I think it's better still use "--no-period" to control the PERIOD bit.

> 
> Also it does not seem to honor --no-period.
>

Right, perf tool doesn't clear the PERIOD for --no-period.
	if (opts->period)
		perf_evsel__set_sample_bit(evsel, PERIOD);

I will submit a patch to fix it.

> All of this makes it hard to use large PEBS in general.
> 
> So I think your series should also address this part.

The issue looks like perf tool problem. I will take a look at it, but 
probably address it in a separate patch set.

Thanks,
Kan
> 
>>   - Introduce a special purpose intel_pmu_save_and_restart()
>>     just for AUTO_RELOAD.
>>   - New patch to disable userspace RDPMC usage if large PEBS is enabled.
>>
>> ------
>>
>> There is a bug when mmap read event->count with large PEBS enabled.
>> Here is an example.
>>   #./read_count
>>   0x71f0
>>   0x122c0
>>   0x1000000001c54
>>   0x100000001257d
>>   0x200000000bdc5
>>
>> The bug is caused by two issues.
>> - In x86_perf_event_update, the calculation of event->count does not
>>    take the auto-reload values into account.
>> - In x86_pmu_read, it doesn't count the undrained values in large PEBS
>>    buffers.
>>
>> The first issue was introduced with the auto-reload mechanism enabled
>> since commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload
>> mechanism when possible")
>>
>> Patch 1 fixed the issue in x86_perf_event_update.
>>
>> The second issue was introduced since commit b8241d20699e
>> ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS
>> interrupt threshold)")
>>
>> Patch 2-4 fixed the issue in x86_pmu_read.
>>
>> Besides the two issues, the userspace RDPMC usage is broken for large
>> PEBS as well.
>> The RDPMC issue was also introduced since commit b8241d20699e
>> ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS
>> interrupt threshold)")
>>
>> Patch 5 fixed the RDPMC issue.
>>
>> The source code of read_count is as below.
>>
>> struct cpu {
>>          int fd;
>>          struct perf_event_mmap_page *buf;
>> };
>>
>> int perf_open(struct cpu *ctx, int cpu)
>> {
>>          struct perf_event_attr attr = {
>>                  .type = PERF_TYPE_HARDWARE,
>>                  .size = sizeof(struct perf_event_attr),
>>                  .sample_period = 100000,
>>                  .config = 0,
>>                  .sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID |
>>                                  PERF_SAMPLE_TIME | PERF_SAMPLE_CPU,
>>                  .precise_ip = 3,
>>                  .mmap = 1,
>>                  .comm = 1,
>>                  .task = 1,
>>                  .mmap2 = 1,
>>                  .sample_id_all = 1,
>>                  .comm_exec = 1,
>>          };
>>          ctx->buf = NULL;
>>          ctx->fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
>>          if (ctx->fd < 0) {
>>                  perror("perf_event_open");
>>                  return -1;
>>          }
>>          return 0;
>> }
>>
>> void perf_close(struct cpu *ctx)
>> {
>>          close(ctx->fd);
>>          if (ctx->buf)
>>                  munmap(ctx->buf, pagesize);
>> }
>>
>> int main(int ac, char **av)
>> {
>>          struct cpu ctx;
>>          u64 count;
>>
>>          perf_open(&ctx, 0);
>>
>>          while (1) {
>>                  sleep(5);
>>
>>                  if (read(ctx.fd, &count, 8) != 8) {
>>                          perror("counter read");
>>                          break;
>>                  }
>>                  printf("0x%llx\n", count);
>>
>>          }
>>          perf_close(&ctx);
>> }
>>
>> Kan Liang (5):
>>    perf/x86/intel: fix event update for auto-reload
>>    perf/x86: introduce read function for x86_pmu
>>    perf/x86/intel/ds: introduce read function for large pebs
>>    perf/x86/intel: introduce read function for intel_pmu
>>    perf/x86: fix: disable userspace RDPMC usage for large PEBS
>>
>>   arch/x86/events/core.c       |  5 ++-
>>   arch/x86/events/intel/core.c |  9 +++++
>>   arch/x86/events/intel/ds.c   | 85 ++++++++++++++++++++++++++++++++++++++++++--
>>   arch/x86/events/perf_event.h |  3 ++
>>   4 files changed, 99 insertions(+), 3 deletions(-)
>>
>> --
>> 2.7.4
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ