linux-kernel - Re: Help with trace-cmd/ftrace recording process ID information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 17 Jul 2017 15:53:52 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Will Hawkins <hawkinsw@...laugic.com>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: Help with trace-cmd/ftrace recording process ID information

On Mon, 17 Jul 2017 15:18:18 -0400
Will Hawkins <hawkinsw@...laugic.com> wrote:

> Hello everyone, especially Mr. Rostedt,
> 
> I have had great success with ftrace debugging performance issues on 
> Linux systems. The combination of ftrace and trace-cmd are absolutely 
> amazing tools for digging in to exactly what is going on in a system and 
> where performance problems exist.
> 
> I recently switched to a different host and attempted to run trace-cmd 
> record to get a record of page faults:
> 
> /path/to//trace-cmd/trace-cmd record -e page_fault_user /bin/ls
> 
> When I "report" on that trace, I get entries like the following:
> 
> <...>-41850 [010] 27484983.185200: page_fault_user: 
> address=__per_cpu_end ip=__per_cpu_end error_code=0x14
> 
> It's exactly what I want. However, it does not list the process that 
> generated that fault. Instead, it uses <...>. I dug into the trace-cmd 
> code and see where this is generated and why it is generated.
> 
> What I don't understand is why on a different system, when I run the 
> same record command, I get the following output:
> 
>   ls-19887 [005] 2438162.263793: page_fault_user: 
> address=__per_cpu_end ip=__per_cpu_end error_code=0x14
> 
> Again, it's exactly what I want and it lists the process name that 
> generated the fault.
> 
>  From the code, I see that the <...> is printed instead of the name of 
> the process when the pid is not in the pevent's command lines. What I 
> can't seem to figure out is why the process would be in that list on one 
> host and not on the other.

Are you using the same kernel version and trace-cmd version on both
hosts?

> 
> When I looked at the trace.dat file directly, I did notice that on the 
> "good" host, there are a list of pids and names. On the "bad" host, 
> there is no such list in the trace.dat file. I am sure that is the 
> reason for the <...>s being printed, but I can't figure out why that 
> list is not getting in the trace.dat file.
> 
> I gave a quick look to try to find where that pid/comm list is generated 
> and written to the trace.dat file, but couldn't find anything.
> 
> I figured that I would send an email before I dug any further in case 
> someone has seen this already. I am happy to pass along other pertinent 
> information if it is helpful to debug the problem. I just don't want to 
> spam the list with information that is irrelevant.
> 
> Again, the combination of ftrace/trace-cmd is borderline magic. I 
> appreciate all the work that has gone into it!
> 
> Thanks in advance for helping me sort through this issue!

The comm (the program name) is not saved at each event. Instead,
there's a "cache" of them. On a schedule switch, when tracing is
active, it will store a comm in a table. The trace file uses this list
too. When trace-cmd is finished tracing, it will read that table which
is located in the tracefs directory and the file is called
saved_cmdlines.

By default, it saves 128 comms. If you want more or less, you can
change the size by echoing in the new size number into the file
saved_cmdlines_size.

I'm not sure why trace-cmd didn't save that file, unless it was an
older version that did the recording.


-- Steve