lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20181031.090816.2117345408719881030.davem@davemloft.net>
Date:   Wed, 31 Oct 2018 09:08:16 -0700 (PDT)
From:   David Miller <davem@...emloft.net>
To:     jolsa@...hat.com
Cc:     acme@...nel.org, linux-kernel@...r.kernel.org, namhyung@...nel.org,
        jolsa@...nel.org
Subject: Re: [PATCH RFC] hist lookups

From: Jiri Olsa <jolsa@...hat.com>
Date: Wed, 31 Oct 2018 16:39:07 +0100

> it'd be great to make hist processing faster, but is your main target here
> to get the load out of the reader thread, so we dont lose events during the
> hist processing?
> 
> we could queue events directly from reader thread into another thread and
> keep it (the reader thread) free of processing, focusing only on event
> reading/passing 

Indeed, we could create threads that take samples from the thread processing
the ring buffers, and insert them into the histogram.

In fact, since there is pthread locking already around the histogram
datastructures we could parallelize that as much as we want.

If beneficial we could also parallelize the ring buffer processing
into a small number of threads too.

My understanding is that in it's default mode perf gets one event ring
buffer per cpu being analyzed.  So we could divide that number of
rings by some factor, like 16 or something, and thus divide the rings
into groups of 16 with one thread assigned to each group.

There is one major concern about this though.  Creating threads makes
perf a bit more "invasive" to the workload it is observing.  And that
is something we've always worked to minimize.

I think your idea to add threads for the histogram work is great.

But I still think that the histogram code is really bloated, and doing
a full 262 byte memset on every histogram lookup is unnecessary
overhead.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ