lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 12 Jul 2008 12:16:27 +0200
From:	"John Kacur" <jkacur@...il.com>
To:	"Andrew Morton" <akpm@...ux-foundation.org>
Cc:	"Steven Rostedt" <rostedt@...dmis.org>,
	"Randy Dunlap" <randy.dunlap@...cle.com>,
	"Elias Oltmanns" <eo@...ensachen.de>,
	LKML <linux-kernel@...r.kernel.org>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Peter Zijlstra" <peterz@...radead.org>,
	"Clark Williams" <clark.williams@...il.com>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Jon Masters" <jonathan@...masters.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [PATCH -v2] ftrace: Documentation

On Sat, Jul 12, 2008 at 12:37 AM, Andrew Morton
<akpm@...ux-foundation.org> wrote:
>
> On Fri, 11 Jul 2008 16:59:53 -0400 (EDT) Steven Rostedt <rostedt@...dmis.org> wrote:
>
> >
> > > > +
> > > > +  tracing_cpumask : This is a mask that lets the user only trace
> > > > +         on specified CPUS. The format is a hex string
> > > > +         representing the CPUS.
> > >
> > > Why is this feature useful?  (I'd have asked this prior to merging, if I'd
> > > known it existed!)
> >
> > I can't comment on this. I didn't write that code, I just added it to
> > the document because I saw it existed. This was added by Ingo and Thomas,
> > without much description to why. I think it allows you to limit which
> > CPUS to perform the trace on.
>
> Information such as "why this code exists" seems fairly important ;)
> It's surprising how often people forget to mention it (in comments, and
> changelogs).
>
> > >
> > > > +  preemptirqsoff - Similar to irqsoff and preemptoff, but traces and
> > > > +          records the largest time irqs and/or preemption is
> > > > +          disabled.
> > >
> > > s/time/time for which/
> > >
> > > This interface has a strange mix of wordsruntogether and
> > > words_separated_by_underscores.  Oh well - another consequence of
> > > post-facto changelogging.
> >
> > I should make sched_switch to schedswitch and that way we have the files
> > having underscores and the tracers without them. Or should I add
> > underscores to all of them?
>
> Adding underscores is better, but it might not be worth the churn now, dunno.
>
> > > > +
> > > > +Here's an example of the output format of the file "trace"
> > > > +
> > > > +                             --------
> > > > +# tracer: ftrace
> > > > +#
> > > > +#           TASK-PID   CPU#    TIMESTAMP  FUNCTION
> > > > +#              | |      |          |         |
> > > > +            bash-4251  [01] 10152.583854: path_put <-path_walk
> > > > +            bash-4251  [01] 10152.583855: dput <-path_put
> > > > +            bash-4251  [01] 10152.583855: _atomic_dec_and_lock <-dput
> > > > +                             --------
> > >
> > > pids are no longer unique system-wide, and any part of the kernel ABI which
> > > exports them to userspace is, basically, broken.  Oh well.
> >
> > What should be used instead?  Of course we're not using a kernel ABI, we
> > are using an API (text based ;-) But more on that later.
>
> Well that's an interesting question and it has come up before.  There
> are times when the kernel wants to display a process identifier at
> least in a printk.  Oopses are one prominent example.
>
> Perhaps we do need a way of doing this in a post-pid-namespace-world.
> Presumably it would be of the form "pidns-identifier:pid", and just
> plain old "pid" if no pid namespaces are in operation, for some
> back-compatibility where possible.
>
> Eric, any thoughts?
>
> > > > +# tracer: irqsoff
> > > > +#
> > > > +irqsoff latency trace v1.1.5 on 2.6.26-rc8
> > > > +--------------------------------------------------------------------
> > > > + latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
> > > > +    -----------------
> > > > +    | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
> > > > +    -----------------
> > > > + => started at: apic_timer_interrupt
> > > > + => ended at:   do_softirq
> > > > +
> > > > +#                _------=> CPU#
> > > > +#               / _-----=> irqs-off
> > > > +#              | / _----=> need-resched
> > > > +#              || / _---=> hardirq/softirq
> > > > +#              ||| / _--=> preempt-depth
> > > > +#              |||| /
> > > > +#              |||||     delay
> > > > +#  cmd     pid ||||| time  |   caller
> > > > +#     \   /    |||||   \   |   /
> > > > +  <idle>-0     0d..1    0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
> > > > +  <idle>-0     0d.s.   97us : __do_softirq (do_softirq)
> > > > +  <idle>-0     0d.s1   98us : trace_hardirqs_on (do_softirq)
> > >
> > > The kernel prints all that stuff out of a debugfs file?
> > >
> > > What have we done? :(
> >
> > This is very helpful on embedded systems.
>
> Well...  why?  embedded platforms can run userspace programs too.  But
> the ornate nature of this kernel->userspace interface has gone and made
> implementation of userspace parsers hard.
>
> > If you are suggesting that the kernel comes with its own user land app
> > (in scripts/ ?) to handle all the new tracers, then maybe it would be
> > OK.
>
> This also comes up again and again.  Kernel programmers have no
> convenient route for delivering userspace code to users, so they end up
> putting userspace functionality into the kernel.
>
> getdelays.c is a counter-example.  We've maintained that as new
> taskstats capabilities have come along and as it turned out, this was
> quite easy and people find geydelays.c to be quite useful.  Its name is
> outdated though.
>
> >
> > > > +first followed by the next task or task waking up. The format for both
> > > > +of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO
> > > > +is the inverse of the actual priority with zero (0) being the highest
> > > > +priority and the nice values starting at 100 (nice -20). Below is
> > > > +a quick chart to map the kernel priority to user land priorities.
> > > > +
> > > > +  Kernel priority: 0 to 99    ==> user RT priority 99 to 0
> > > > +  Kernel priority: 100 to 139 ==> user nice -20 to 19
> > > > +  Kernel priority: 140        ==> idle task priority
> > > > +
> > > > +The task states are:
> > > > +
> > > > + R - running : wants to run, may not actually be running
> > > > + S - sleep   : process is waiting to be woken up (handles signals)
> > > > + D - deep sleep : process must be woken up (ignores signals)
> > >
> > > "uninterruptible sleep", please.  no need to invent new (and hence
> > > unfamilar) terms!
> >
> > This is my own ignorance.  I didn't know the best way to say it. Why do
> > we use 'D' for "uninterruptible sleep"? I don't see a 'D' in there? But
> > "deep sleep" is more obvious. OK, I'll shut up and change it to
> > "uniterruptible sleep".
> >
>
> Heh.  Maybe "D" does indeed refer to "deep sleep".  That's all before
> my time.  But yes, "uninterruptible sleep" is the well-known term for
> this state.
----SNIP----
According to array.c in the kernel, 'D' stands for disk sleep

static const char *task_state_array[] = {
	"R (running)",		/*  0 */
	"M (running-mutex)",	/*  1 */
	"S (sleeping)",		/*  2 */
	"D (disk sleep)",	/*  4 */
	"T (stopped)",		/*  8 */
	"T (tracing stop)",	/* 16 */
	"Z (zombie)",		/* 32 */
	"X (dead)"		/* 64 */
};
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ