[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <727920fa-219c-6d8d-3ba6-2f0553b2cbdc@redhat.com>
Date: Thu, 27 Apr 2017 19:32:43 +0530
From: Pratyush Anand <panand@...hat.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, dyoung@...hat.com, xlpang@...hat.com
Subject: Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect
peak memory
Hi Steve,
On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote:
>
> Sorry for the late reply. I finally have time to start looking at
> trace-cmd again.
Thanks a lot for your review :-)
>
> On Wed, 1 Mar 2017 20:32:37 +0530
> Pratyush Anand <panand@...hat.com> wrote:
>
[...]
> This is as nice idea, but it needs some clean ups. I tried it out just
> to play with it.
>
> First, I just did:
>
> trace-cmd top
>
> And nothing happened. Either a help message needs to be printed, or it
> starts something. Maybe have it be interactive, that is, print out data
> as it comes in?
>
[...]
>
> But then for some reason it -p stopped doing anything. And -e didn't
> stop it either. But after killing it manually and removing the pid
> file, it appeared to work normal again. I can't seem to repeat it, so
> lets not worry about that now.
>
Yes, this version was sharing info between two processes using files. I have a
new version [1] which has sharing through socket and that works better. Help
issue is also fixed in that version.
> More below.
>
[...]
>> +void top_disable_events(void)
>> +{
>> + char *path;
>> + char c;
>> + int fd;
>> + int ret;
>> +
>> + /*
>> + * WARNING: We want only few events related to memory allocation to
>> + * be enabled. Disable all events here. Latter, selective events
>> + * would be enabled
>
> trace-cmd reset does this. You may want to call that instead of doing
> it manually here. Someone else wanted to pull out trace-cmd reset into
> its own file. I'll have to get those patches. But then we can call that
> and enable tracing again.
>
> Or better yet, if we are only tracing memory events, just create your
> own instance and use that. Why bother with normal tracing?
>
> mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory
Yes, it seems a good idea. Will do.
>
>> + */
>> + c = '0';
>> + path = get_instance_file(&top_instance, "events/enable");
>> + fd = open(path, O_WRONLY);
>> + if (fd < 0)
>> + die("failed to open '%s'", path);
>> + ret = write(fd, &c, 1);
>> + close(fd);
>> + tracecmd_put_tracing_file(path);
>> + if (ret < 1)
>> + die("failed to write 0 to events/enable");
>> +
>> + path = get_instance_file(&top_instance, "tracing_on");
>> + fd = open(path, O_WRONLY);
>> + if (fd < 0)
>> + die("failed to open '%s'", path);
>> + ret = write(fd, &c, 1);
>> + close(fd);
>> + tracecmd_put_tracing_file(path);
>> + if (ret < 1)
>> + die("failed to write 0 to tracing_on\n");
>> +
>> + path = get_instance_file(&top_instance, "set_event");
>> + fd = open(path, O_WRONLY);
>> + if (fd < 0)
>> + die("failed to open '%s'", path);
>> + close(fd);
>> + tracecmd_put_tracing_file(path);
>> +}
>> +
>> +void top_exit(void)
>> +{
>> + int i;
>> + /*
>> + * free all the dynamic memories which could have been allocated by
>> + * this program
>> + */
>> + if (kbuf)
>> + kbuffer_free(kbuf);
>> + if (record)
>> + free_record(record);
>> + if (pevent)
>> + pevent_free(pevent);
>
> The above three functions all can handle a NULL pointer. Remove the if
> statements.
ok
>
>> + if (top_task_stats) {
>> + for (i = 0; i < top_task_cnt; i++)
>> + free(top_task_stats[i].comm);
>> + free(top_task_stats);
>> + }
>> +
>> + top_disable_events();
>> + remove(TRACE_CMD_TOP_PID_FILE);
>> +}
>> +
>> +void top_initialize_events(void)
>> +{
>> + char *path;
>> + int fd;
>> + char c;
>> + int ret;
>> + char *buf;
>> + int size;
>> + enum kbuffer_long_size long_size;
>> + enum kbuffer_endian endian;
>> + char id_str[8];
>> +
>
> Might want to look at trace_stream_init() and use that instead. It is
> made for live streaming of events.
OK, will look into trace_stream_init().
>
>> + /* trace data is read from the trace_raw_pipe directly */
>> + if (tracecmd_host_bigendian())
>> + endian = KBUFFER_ENDIAN_BIG;
>> + else
>> + endian = KBUFFER_ENDIAN_LITTLE;
>> +
[...]
>> +void top_start(void)
>> +{
>> + pid_t pid;
>> + int fd, ret;
>> +
>> + top_disable_events();
>> + top_initialize_events();
>> + page_size = getpagesize();
>> +
>> + pid = fork();
>> + if (!pid) {
>> + do {
>> + top_process_events();
>> + sleep(1);
>
> We should probably just sleep on the buffer, waiting for data.
OK
>
>> + if (set_print_only) {
>> + top_print_stats();
>> + set_print_only = false;
>> + }
>> + if (set_print_and_exit) {
>> + top_print_stats();
>> + top_exit();
>> + break;
>> + }
>> + } while (1);
>> + } else {
>> + fd = open(TRACE_CMD_TOP_PID_FILE, O_WRONLY | O_CREAT);
>> + if (fd < 0)
>> + goto kill_child;
>> + ret = write(fd, &pid, sizeof(pid));
>> + if (ret < sizeof(pid))
>> + goto close_fd;
>> + close(fd);
>> + }
>> +
>> + return;
>> +
>> +close_fd:
>> + close(fd);
>> +kill_child:
>> + kill(pid, SIGUSR2);
>> + die("Failed to start trace-top");
>> + return;
>> +}
>> +
[...]
> Looks good, I think we can extend on this.
Thanks :-)
I will implement your review comments and will send next revision.
However, I had couple of observation which I was unable to justify:
# ./trace-cmd top -s /tmp/test
# ./trace-cmd top -p /tmp/test | grep trace-cmd
15292 trace-cmd 22144 15808
Here,15292 is the pid of trace-cmd task
22144 KB is the peak memory usage
15808 KB is the current memory usage
Now check rss component from statm
# cat /proc/15292/statm
50 35 23 7 0 12 0 36
This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
pages = 35*64 = 2240KB
I patched my kernel [2] for test purpose, so that statm reports peak memory as
well. Here, the last extra entry in statm output is peak memory and it is 36
pages = 2304KB.
So, this is a huge difference between what has been reported by statm and what
we get from trace-cmd.
I understand that `trace-cmd top` output would only be approximate, because
many of the memory could be allocated by task and freed in interrupt context.
So, many a time it can differ. But, is such a huge difference justified? If
yes, can we count on the output of this utility to find early boot time oom
issues?
[1]
https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
[2]
https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517
Powered by blists - more mailing lists