[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <55875cff-fe7f-a468-0f6a-29f569fe995d@linux.vnet.ibm.com>
Date: Mon, 20 Feb 2017 09:41:48 +0530
From: Hari Bathini <hbathini@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: ast@...com, lkml <linux-kernel@...r.kernel.org>, acme@...nel.org,
alexander.shishkin@...ux.intel.com, mingo@...hat.com,
daniel@...earbox.net, rostedt@...dmis.org,
Ananth N Mavinakayanahalli <ananth@...ux.vnet.ibm.com>,
ebiederm@...ssion.com, sargun@...gun.me,
Aravinda Prasad <aravinda@...ux.vnet.ibm.com>,
brendan.d.gregg@...il.com, jolsa@...hat.com
Subject: Re: [PATCH v6 1/3] perf: add PERF_RECORD_NAMESPACES to include
namespaces related info
Hi Peter,
On Thursday 16 February 2017 03:58 PM, Peter Zijlstra wrote:
> On Wed, Feb 08, 2017 at 02:01:24PM +0530, Hari Bathini wrote:
>> With the advert of container technologies like docker, that depend
>> on namespaces for isolation, there is a need for tracing support for
>> namespaces. This patch introduces new PERF_RECORD_NAMESPACES event
>> for tracing based on namespaces related info. This event records
>> the device and inode numbers for every namespace of all processes.
>>
>> While device number is same for all namespaces currently, that may
> 'While device numbers are the same ..." ?
>
>> change in future, to avoid the need for a namespace of namespaces.
> Unfinished sentence
Unnecessary comma spoiled the sentence, I guess.
>> Recording device number along with inode number will take care of such
>> scenario.
> More words on why this is so? Because I've no clue.
This should have read:
"Each namespace has a combination of device and inode numbers. Though
every namespace has the same device number currently, that may change
in future to avoid the need for a namespace of namespaces. Considering
such possibility, record both device and inode numbers separately for
each namespace."
Will update changelog accordingly..
>> @@ -6584,6 +6590,135 @@ void perf_event_comm(struct task_struct *task, bool exec)
>> }
>>
>> /*
>> + * namespaces tracking
>> + */
>> +
>> +struct namespaces_event_id {
>> + struct perf_event_header header;
>> +
>> + u32 pid;
>> + u32 tid;
>> + u32 nr_namespaces;
>> + struct perf_ns_link_info link_info[NAMESPACES_MAX];
>> +};
> Contains the same hole and makes the record depend on architecture
> alignment requirements.
Ugh! Let me change nr_namespaces type to u64 and also drop name field
in perf_ns_link_info struct.
>> +static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
>> + struct task_struct *task,
>> + const struct proc_ns_operations *ns_ops)
>> +{
>> + struct path ns_path;
>> + struct inode *ns_inode;
>> + void *error;
>> +
>> + error = ns_get_path(&ns_path, task, ns_ops);
>> + if (!error) {
>> + snprintf(ns_link_info->name, NS_NAME_SIZE,
>> + "%s", ns_path.dentry->d_iname);
>> +
>> + ns_inode = ns_path.dentry->d_inode;
>> + ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
>> + ns_link_info->ino = ns_inode->i_ino;
>> + }
>> +}
>> +
>> +static void perf_event_namespaces_output(struct perf_event *event,
>> + void *data)
>> +{
>> + struct perf_namespaces_event *namespaces_event = data;
>> + struct perf_output_handle handle;
>> + struct perf_sample_data sample;
>> + struct namespaces_event_id *ei;
>> + struct task_struct *task = namespaces_event->task;
>> + int ret;
>> +
>> + if (!perf_event_namespaces_match(event))
>> + return;
>> +
>> + ei = &namespaces_event->event_id;
>> + perf_event_header__init_id(&ei->header, &sample, event);
>> + ret = perf_output_begin(&handle, event, ei->header.size);
>> + if (ret)
>> + return;
>> +
>> + ei->pid = perf_event_pid(event, task);
>> + ei->tid = perf_event_tid(event, task);
>> +
>> + ei->nr_namespaces = NAMESPACES_MAX;
>> +
>> + perf_fill_ns_link_info(&ei->link_info[MNT_NS_INDEX],
>> + task, &mntns_operations);
>> +
>> +#ifdef CONFIG_USER_NS
>> + perf_fill_ns_link_info(&ei->link_info[USER_NS_INDEX],
>> + task, &userns_operations);
>> +#endif
>> +#ifdef CONFIG_NET_NS
>> + perf_fill_ns_link_info(&ei->link_info[NET_NS_INDEX],
>> + task, &netns_operations);
>> +#endif
>> +#ifdef CONFIG_UTS_NS
>> + perf_fill_ns_link_info(&ei->link_info[UTS_NS_INDEX],
>> + task, &utsns_operations);
>> +#endif
>> +#ifdef CONFIG_IPC_NS
>> + perf_fill_ns_link_info(&ei->link_info[IPC_NS_INDEX],
>> + task, &ipcns_operations);
>> +#endif
>> +#ifdef CONFIG_PID_NS
>> + perf_fill_ns_link_info(&ei->link_info[PID_NS_INDEX],
>> + task, &pidns_operations);
>> +#endif
>> +#ifdef CONFIG_CGROUPS
>> + perf_fill_ns_link_info(&ei->link_info[CGROUP_NS_INDEX],
>> + task, &cgroupns_operations);
>> +#endif
> Two points here, since you've given these thingies a string identifier,
> do you still need to rely on their positional index? If not, you could
> generate smaller events if we lack some of these CONFIG knobs.
Dropping name field. So, I don't think this applies now..
> Secondly, does anything in here depend on @event? Afaict you're
> generating the exact same information for each event.
>
> The reason we set {pid,tid} for each event is because of PID_NS, we
> report the pid number as per the namespace associated with the event.
>
> But from what I can tell, there is no such namespace of namespaces
> dependency here and this should live in the function below.
Right. I will move it to perf_event_namespaces() function.
>> @@ -9667,6 +9804,11 @@ SYSCALL_DEFINE5(perf_event_open,
>> return -EACCES;
>> }
>>
>> + if (attr.namespaces) {
>> + if (!capable(CAP_SYS_ADMIN))
>> + return -EACCES;
>> + }
> Many times we allow users access to such information; should this be
> qualified with something like perf_paranoid_kernel() or somesuch?
Hmmm.. Need access to namespace info of every process. Synthesized events
try and access /proc/1/ns/ and the like needing CAP_SYS_ADMIN. So, the
question is, should running perf tool be allowed with perf_paranoid_kernel()
check and complain later when access to /proc/1/ns and the like fails
(on running
perf tool without CAP_SYS_ADMIN)?
>> +
>> if (attr.freq) {
>> if (attr.sample_freq > sysctl_perf_event_sample_rate)
>> return -EINVAL;
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 11c5c8a..fd77e67 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -2289,6 +2289,9 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
>> free_fs_struct(new_fs);
>>
>> bad_unshare_out:
>> + if (!err)
>> + perf_event_namespaces(current);
>> +
>> return err;
>> }
> That seems like a very odd place, why not put it right before the
> bad_unshare_cleanup_cred: label?
>
>> @@ -264,6 +265,10 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
>> switch_task_namespaces(tsk, new_nsproxy);
>> out:
>> fput(file);
>> +
>> + if (!err)
>> + perf_event_namespaces(tsk);
>> +
>> return err;
>> }
> Same again..
My bad! Will change that..
Thanks
Hari
Powered by blists - more mailing lists