linux-kernel - Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fe49c9b5-88fb-c827-9262-1931bcf80b6c@linux.intel.com>
Date:   Thu, 4 Oct 2018 20:11:28 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Jann Horn <jannh@...gle.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Mark Rutland <mark.rutland@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Kees Cook <keescook@...omium.org>,
        Andi Kleen <ak@...ux.intel.com>, tursulin@...ulin.net,
        kernel list <linux-kernel@...r.kernel.org>,
        tvrtko.ursulin@...ux.intel.com,
        the arch/x86 maintainers <x86@...nel.org>,
        "H . Peter Anvin" <hpa@...or.com>, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, jolsa@...hat.com,
        namhyung@...nel.org, maddy@...ux.vnet.ibm.com
Subject: Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)

Hi,

On 03.10.2018 20:01, Jann Horn wrote:
> On Mon, Oct 1, 2018 at 10:53 PM Alexey Budankov
> <alexey.budankov@...ux.intel.com> wrote:
<SNIP>
>> 3. Every time an event for ${PMU} is created over perf_event_open():
>>    a) the calling thread's euid is checked to belong to ${PMU}_users group
>>       and if it does then the event's fd is allocated;
>>    b) then traditional checks against perf_event_pranoid content are applied;
>>    c) if the file doesn't exist the access is governed by global setting
>>       at /proc/sys/kernel/perf_even_paranoid;
> 
> You'll also have to make sure that this thing in kernel/events/core.c
> doesn't have any bad effect:
> 
>     /*
>     * Special case software events and allow them to be part of
>     * any hardware group.
>     */
> 
> As in, make sure that you can't smuggle in arbitrary software events
> by attaching them to a whitelisted hardware event.

Yes, makes sense. Please see and comment below.

> 
<SNIP>
>> Security analysis for uncore IMC, QPI/UPI, PCIe PMUs is still required
>> to be enabled for fine grain control.
> 
> And you can't whitelist anything that permits using sampling events
> with arbitrary sample_type.
> 

It appears that there is a dependency on the significance of data that PMUs captures 
for later analysis. Currently there are following options for data being captured 
(please correct or extend if something is missing from the list below):

1) Monitored process details:
   - system information on a process as a container (of threads, memory data and 
     IDs (e.g. open fds) from process specific namespaces and etc.);
   - system information on threads as containers (of execution context details);
2) Execution context details:
   - memory addresses;
   - memory data;
   - calculation results;
   - calculation state in HW;
3) Monitored process and execution context telemetry data, used for building 
   various performance metrics and can come from:
   - user mode code and OS kernel;
   - various parts of HW e.g. core, uncore, peripheral and etc.

Group 2) is the potential leakage source of sensitive process data so if a PMU, 
at some mode, samples execution context details then the PMU, working in that mode, 
is the subject for *access* and *scope* control.

On the other hand if captured data contain only the monitored process details 
and/or associated execution telemetry, there is probably no sensitive data leakage 
thru that captured data.

For example, if cpu PMU samples PC addresses overtime, e.g. for providing 
hotspots-by-function profile, then this requires to be controlled as from access as 
from scope perspective, because PC addresses is execution context details that 
can contain sensitive data.

However, if cpu PMU does counting of some metric value, or if software PMU reads 
value of thread active time from the OS, possibly overtime, for later building some 
rating profile, or reading of some HW counter value without attribution to any 
execution context details, that is probably not that risky as in the case of 
PC address sampling.

Uncore PMUs e.g. memory controller (IMC), interconnect (QPI/UPI) and peripheral (PCIe) 
currently only read counters values that are captured system wide by HW, and provide 
no attribution to any specific execution context details, thus, sensitive process data.

Based on that,

A) paranoid knob is required for a PMU if it can capture data from group 2)
B) paranoid knob limits scope of capturing sensitive data:
   -3 - *scope* is defined by some high level setting
   -2 - disabled - no allowed *scope*
   -1 - no restrictions - max *scope*
    0 - system wide
    1 - process user and kernel space
    2 - process user space only
C) paranoid knob has to be checked every time the PMU is going to start 
   capturing sensitive data to avoid capturing beyond the allowed scope.

PMU *access* semantics is derived from fs ACLs and could look like this:

r - read PMU architectural and configuration details, read PMU *access* settings
w - modify PMU *access* settings
x - modify PMU configuration and collect data

So levels of *access* to PMU could look like this:

root=rwx, ${PMU}_users=r-x, other=r--.

Possible examples of *scope* control settings could look like this:

1) system wide user+kernel mode CPU sampling with context switches 
   and uncore counting:

	/proc/sys/kernel/perf_event_paranoid (-2, 2): 0
	SW.paranoid  (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
	CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
	IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -3
	UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -3
	PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -3

2) per-process CPU sampling with context switches and uncore counting:

	/proc/sys/kernel/perf_event_paranoid (-2, 2): 1|2
	SW.paranoid  (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
	CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
	IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -1
	UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -1
	PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -1

3) per-process user mode CPU sampling allowed to specific ${PMU}_groups only:

	/proc/sys/kernel/perf_event_paranoid (-2, 2): -2
	SW.paranoid  (-3, 2):(root=rwx, SW_users=r-x,other=r--):  2
	CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--):  2
	IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -3
	UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -3
	PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -3

4) uncore HW counters monitoring, possibly overtime:

	/proc/sys/kernel/perf_event_paranoid (-2, 2): -2
	SW.paranoid  (-3, 2):(root=rwx, SW_users=r-x,other=r--): -3
	CPU.paranoid (-3, 2):(root=rwx,CPU_users=r-x,other=r--): -3
	IMC.paranoid (-3,-1):(root=rwx,IMC_users=r-x,other=r--): -1
	UPI.paranoid (-3,-1):(root=rwx,UPI_users=r-x,other=r--): -1
	PCI.paranoid (-3,-1):(root=rwx,PCI_users=r-x,other=r--): -1

Please share more thought so that it eventually could go into 
Documentation/admin-guide/perf-security.rst.

Thanks,
Alexey