lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b495ee6c-84f3-0bec-4c7b-dd7730e17076@linux.intel.com>
Date:   Tue, 11 Sep 2018 16:42:09 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v8 0/3]: perf: reduce data loss when profiling highly
 parallel CPU bound workloads

Hi,

On 11.09.2018 11:34, Jiri Olsa wrote:
> On Tue, Sep 11, 2018 at 11:16:45AM +0300, Alexey Budankov wrote:
>>
>> Hi Ingo,
>>
>> On 11.09.2018 9:35, Ingo Molnar wrote:
>>>
>>> * Alexey Budankov <alexey.budankov@...ux.intel.com> wrote:
>>>
>>>> It may sound too optimistic but glibc API is expected to be backward compatible 
>>>> and for POSIX AIO API part too. Internal implementation also tends to evolve to 
>>>> better option overtime, more probably basing on modern kernel capabilities 
>>>> mentioned here: http://man7.org/linux/man-pages/man2/io_submit.2.html
>>>
>>> I'm not talking about compatibility, and I'm not just talking about glibc, perf works under 
>>> other libcs as well - and let me phrase it in another way: basic event handling, threading, 
>>> scheduling internals should be a *core competency* of a tracing/profiling tool.
>>
>> Well, the requirement of independence from some specific libc implementation 
>> as well as *core competency* design approach clarify a lot. Thanks!
>>
>>>
>>> I.e. we might end up using the exact same per event fd thread pool design that glibc uses 
>>> currently. Or not. Having that internal and open coded to perf, like Jiri has started 
>>> implementing it, allows people to experiment with it.
>>
>> My point here is that following some standardized programming models and APIs 
>> (like POSIX) in the tool code, even if the tool itself provides internal open 
>> coded implementation for the APIs, would simplify experimenting with the tool 
>> as well as lower barriers for new comers. Perf project could benefit from that.
>>
>>>
>>> This isn't some GUI toolkit, this is at the essence of perf, and we are not very good on large 
>>> systems right now, and I think the design should be open-coded threading, not relying on an 
>>> (perf-)external AIO library to get it right.
>>>
>>> The glibc thread pool implementation of POSIX AIO is basically a fall-back 
>>> implementation, for the case where there's no native KAIO interface to rely on.
>>>
>>>> Well, explicit threading in the tool for AIO, in the simplest case, means 
>>>> incorporating some POSIX API implementation into the tool, avoiding 
>>>> code reuse in the first place. That tends to be error prone and costly.
>>>
>>> It's a core competency, we better do it right and not outsource it.
>>
>> Yep, makes sense.
> 
> on the other hand, we are already trying to tie this up under perf_mmap
> object, which is what the threaded patchset operates on.. so I'm quite
> confident that with little effort we could make those 2 things live next
> to each other and let the user to decide which one to take and compare
> 
> possibilities would be like: (not sure yet the last one makes sense, but still..)
> 
>   # perf record --threads=...  ...
>   # perf record --aio ...
>   # perf record --threads=... --aio ...
> 
> how about that?

That might be an option. What is the semantics of --threads?

Be aware that when experimenting with serial trace writing on an 8-core 
client machines running an HPC benchmark heavily utilizing all 8 cores 
we noticed that single Perf tool thread contended with the benchmark 
threads.

That manifested like libiomp.so (Intel OpenMP implementation) functions 
appearing among the top hotspots functions and this was indication of 
imbalance induced by the tool during profiling.

That's why we decided to first go with AIO approach, as it is posted,
and benefit from it the most thru multi AIO, prior turning to more 
resource consuming multi-threading alternative. 

> 
> I just rebased the thread patchset, will make some tests (it's been few months,
> so it needs some kicking/checking) and post it out hopefuly this week> 
> jirka
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ