[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56D1CA6B.7040809@gmail.com>
Date: Sun, 28 Feb 2016 01:10:19 +0900
From: Taeung Song <treeze.taeung@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [RFC] Re: [PATCH v3 1/2] tracing/syscalls: Rename variable 'nr'
to '__syscall_nr'
Hi, Steven
On 02/27/2016 03:23 AM, Steven Rostedt wrote:
> On Fri, 26 Feb 2016 10:57:13 -0300
> Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
>> Em Fri, Feb 26, 2016 at 10:14:06PM +0900, Taeung Song escreveu:
>>> There is a problem about duplicated variable name i.e.
>>>
>>> # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
>>> name: sys_enter_io_getevents
>>> ID: 739
>>> format:
>>
>> Steven, what do you think?
>>
>> Should we break this ABI while disambiguating the 'nr' field, using
>> '__syscall_nr' in an attempt to use a name that is unlikely to be used
>> by a real syscall argument name?
>>
>> If we stand by published ABIs, we should keep it written in stone and
>> state that the first 'nr' means '__syscall_nr' while keeping it as-is,
>> the change for 'perf trace' in that case is to do nothing, it work
>> as-is, we have just to fix the python binding to do that rename.
>
> ABIs only matter if they break something, and people complain. Linus
> has been somewhat accepting of us fixing those tools that break and we
> push out the fixes. If an ABI breaks in the forest and nobody is around
> to complain about it, did it really break?
>
> I would say, lets make the change and fix perf. If people complain, we
> send them the fixes for their tools. If they need the distros to have
> the fixes, then let the change be reverted, and we wait till the
> distros have the update (this may take a few years), then re-submit.
>
> This worked for me to get rid of padding that was in every trace event.
> The change was reverted, I fixed the tools that broke, waited till all
> the major distros had the updates. And resubmitted the change. Linus
> took it.
>
>
>>
>> Perhaps we can live with that, to avoid having three different cases:
>> !nr, nr and __syscall_nr.
>
> We could, do this as well. Want me to add something to event-parse?
>
>>
>> Ingo, Peter, have you guys followed this case?
>>
>> Summary: Some tracepoint have multiple fields with the same name, 'nr',
>> the first one is a unique syscall ID, the other is a syscall
>> argument:
>>
>> [root@...et ~]# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
>> name: sys_enter_io_getevents
>> ID: 747
>> format:
>> field:unsigned short common_type; offset:0; size:2; signed:0;
>> field:unsigned char common_flags; offset:2; size:1; signed:0;
>> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
>> field:int common_pid; offset:4; size:4; signed:1;
>>
>> field:int nr; offset:8; size:4; signed:1;
>> field:aio_context_t ctx_id; offset:16; size:8; signed:0;
>> field:long min_nr; offset:24; size:8; signed:0;
>> field:long nr; offset:32; size:8; signed:0;
>> field:struct io_event * events; offset:40; size:8; signed:0;
>> field:struct timespec * timeout; offset:48; size:8; signed:0;
>>
>> print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
>> [root@...et ~]#
>>
>
> BTW, here's a less intrusive change, because honestly, I hate the
> kernel structure having underscores in the name.
>
> This could be signed off by Taeung Song and myself.
>
> -- Steve
>
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 0655afbea83f..d1663083d903 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -186,11 +186,11 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
>
> extern char *__bad_type_size(void);
>
> -#define SYSCALL_FIELD(type, name) \
> - sizeof(type) != sizeof(trace.name) ? \
> +#define SYSCALL_FIELD(type, field, name) \
> + sizeof(type) != sizeof(trace.field) ? \
> __bad_type_size() : \
> - #type, #name, offsetof(typeof(trace), name), \
> - sizeof(trace.name), is_signed_type(type)
> + #type, #name, offsetof(typeof(trace), field), \
> + sizeof(trace.field), is_signed_type(type)
>
> static int __init
> __set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
> @@ -261,7 +261,8 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
> int i;
> int offset = offsetof(typeof(trace), args);
>
> - ret = trace_define_field(call, SYSCALL_FIELD(int, nr), FILTER_OTHER);
> + ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
> + FILTER_OTHER);
> if (ret)
> return ret;
>
> @@ -281,11 +282,12 @@ static int __init syscall_exit_define_fields(struct trace_event_call *call)
> struct syscall_trace_exit trace;
> int ret;
>
> - ret = trace_define_field(call, SYSCALL_FIELD(int, nr), FILTER_OTHER);
> + ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
> + FILTER_OTHER);
> if (ret)
> return ret;
>
> - ret = trace_define_field(call, SYSCALL_FIELD(long, ret),
> + ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
> FILTER_OTHER);
>
> return ret;
>
Would you mean to avoid struct syscall_trace_enter or _exit
has '__syscall_nr' variable ?
So not including a portion of this patch([PATCH v3 1/2]
tracing/syscalls: ...)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8414fa4..98b3c66 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -88,13 +88,13 @@ enum trace_type {
*/
struct syscall_trace_enter {
struct trace_entry ent;
- int nr;
+ int __syscall_nr;
unsigned long args[];
};
struct syscall_trace_exit {
struct trace_entry ent;
- int nr;
+ int __syscall_nr;
long ret;
};
I got it :-)
In conclusion, output of format
(/sys/kernel/debug/tracing/events/syscalls/*/format)
has __syscall_nr for syscall number but the kernel structure
'syscall_trace_enter' and 'syscall_trace_exit' have not __syscall_nr
variable.
Is it right ?
I'll resend modified patch after testing new patch you suggest soon.
Thanks,
Taeung
Powered by blists - more mailing lists