linux-kernel - Re: [PATCH 2/3] perf/x86/mbm: Fix mbm counting for RMID reuse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160510121538.GA3193@twins.programming.kicks-ass.net>
Date:	Tue, 10 May 2016 14:15:38 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Vikas Shivappa <vikas.shivappa@...ux.intel.com>
Cc:	vikas.shivappa@...el.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com, tglx@...utronix.de,
	mingo@...nel.org, ravi.v.shankar@...el.com, tony.luck@...el.com,
	fenghua.yu@...el.com
Subject: Re: [PATCH 2/3] perf/x86/mbm: Fix mbm counting for RMID reuse

On Fri, May 06, 2016 at 04:44:14PM -0700, Vikas Shivappa wrote:
> This patch tries to fix the issue where multiple perf instances try to
> monitor the same PID.

> MBM cannot count directly in the usual perf way of continuously adding
> the diff of current h/w counter and the prev count to the event->count
> because of some h/w dependencies:

And yet the patch appears to do exactly that; *confused*.

>  (1) the mbm h/w counters overflow.

As do most other counters.. so your point is? You also have the software
timer < overflow period..

>  (2) There are limited h/w RMIDs and hence we recycle the RMIDs due to
>      which an event may count from different RMIDs.

This fails to explain why this is a problem.

>  (3) Also we may not want to count at every sched_in and sched_out
>      because the MSR reads involve quite a bit of overhead.

Every single other PMU driver just does this; why are you special?

You list 3 issues of why you think you cannot do the regular thing, but
completely fail to explain how these issues are a problem.

> However we try to do something similar to usual perf way in this patch
> and mainly handle (1) and (3).

> update_sample takes care of the overflow in the hardware counters and
> provides abstraction by returning total bytes counted as if there was no
> overflow. We use this abstraction to count as below:
> 
> init:
>   event->prev_count = update_sample(rmid) //returns current total_bytes
> 
> count: // MBM right now uses count instead of read
>   cur_count = update_sample(rmid)
>   event->count += cur_count - event->prev_count
>   event->prev_count = cur_count

So where does cqm_prev_count come from and why do you need it? What's
wrong with event->hw.prev_count ?

In fact, I cannot seem to find any event->hw.prev_count usage in this or
the next patch, so can we simply use that and not add pointless new
members?