linux-kernel - Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints for worklet lifecycle tracing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090424225909.GA6658@nowhere>
Date:	Sat, 25 Apr 2009 00:59:10 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Zhaolei <zhaolei@...fujitsu.com>, mingo@...e.hu,
	kosaki.motohiro@...fujitsu.com, rostedt@...dmis.org,
	tzanussi@...il.com, linux-kernel@...r.kernel.org, oleg@...hat.com
Subject: Re: [PATCH 0/4] workqueue_tracepoint: Add worklet tracepoints for
	worklet lifecycle tracing

On Fri, Apr 24, 2009 at 01:06:16PM -0700, Andrew Morton wrote:
> On Fri, 24 Apr 2009 19:42:19 +0800
> Zhaolei <zhaolei@...fujitsu.com> wrote:
> 
> > These patchs add tracepoints for per-worklet tracing.
> > Now we have enough tracepoints to start makeing trace_workqueue.c support
> > worklet time mesurement.
> 
> I'm not seing anywhere in this patchset a description of the user
> interface.  What does the operator/programmer actually get to see from
> the kernel as a result of these changes?
> 
> A complete usage example would be an appropriate way of communicating
> all of this.
> 
> The patches introduce a moderate amount of tracing-specific hooks into
> the core workqueue code, which inevitably increases the maintenance
> load for that code.  It is important that it be demonstrated that the
> benefts of the code are worth that cost.  Hence it is important that
> these benefits be demonstrated to us, by yourself.  Please.
> 
> Another way of looking at it: which previously-encountered problems
> would this facility have helped us to solve?  How will this facility
> help us to solve problems in the future?  Looking at this patch series
> I cannot answer those questions!
> 


Hi Andrew,

Although I'm not the author of this patchset, I'm somewhat
involved in the workqueue tracer and I would like to express
my opinion on what is happening.

Until recently, the workqueue tracer was a basic machine.
It was designed to trace the workqueue level. We were not yet
thinking about the worklet level:

- creation of a workqueue thread
- cleanup/destruction of these threads
- insertion of a work in a workqueue
- execution of this work

The output looked like such following histogram:

#   CPU   INSERTED  EXECUTED   WORKQUEUE THREAD
#
      1    125        125       reiserfs/1
      1      0          0       scsi_tgtd/1
      1      0          0       aio/1
      1      0          0       ata/1
      1    114        114       kblockd/1
      1      0          0       kintegrityd/1
      1   2147       2147       events/1
    
      0      0          0       kpsmoused
      0    105        105       reiserfs/0
      0      0          0       scsi_tgtd/0
      0      0          0       aio/0
      0      0          0       ata_aux
      0      0          0       ata/0
      0      0          0       cqueue
      0      0          0       kacpi_notify
      0      0          0       kacpid
      0    149        149       kblockd/0
      0      0          0       kintegrityd/0
      0   1000       1000       khelper
      0   2270       2270       events/0


Its purpose and the information it gave was limited, though
somehow useful.
It was able to provide some informations about the frequency
of the works inserted.

Why that? Because it pushed some useful informations to help a
kernel developer when it's time to decide how to defer async jobs.
Should he use the traditional and whole purposed events workqueues
or a devoted workqueue?

The decision has to be made relatively to:

- the workload it induces. If we might insert too much
  works at a time, we can create unacceptable latencies
  that may delay other queued works. Events/%d are such
  fragile pieces.

- The workload is not only about work insertion frequency
  but also about how much time is spent to process a work.
  Could it be trying to perform too much things which involve
  too much cpu time? Could it sleep for too much time, waiting
  for an event, a very contended lock or something?

- Another kind of reasons that may encumber the events and
  create bad latencies.

After reading this list of requirements, I guess you noticed
that only capturing the frequency of inserted events is only
one variable that composes the workload recipe.
Not only is it giving incomplete information, but it is also
providing erroneous datas to the user which might focuse
on these frequencies when they become large whereas a single work
can create more burden than hundreds of others.


Then, Zhaolei and Kosaki Motohiro came up recently with a good
update for this tracer by posting a patchset which integrates
the per-worklet tracing dimension.
And the output on my laptop became the following:

$ cat /debug/tracing/trace_stat/workqueues 
# CPU  INSERTED  EXECUTED   TASKNAME:PID
# |      |         |        `-WORKFUNC
# |      |         |          |
  0   3822       3898       events/0:9
  0      1          1        |-hpet_work+0x0/0x340
  0     47         46        |-rt_worker_func+0x0/0x400
  0   2655       2654        |-vmstat_update+0x0/0x40
  0      1          1        |-power_supply_changed_work+0x0/0x60
  0    212        212        |-fb_flashcursor+0x0/0x3c0
  0     80         80        |-test_work+0x0/0x64
  0     10          9        |-rekey_seq_generator+0x0/0x70
  0      8          7        |-lru_add_drain_per_cpu+0x0/0x10
  0    455        455        |-console_callback+0x0/0x1e0
  0    249        249        |-flush_to_ldisc+0x0/0x2f0
  0      1          1        |-atkbd_event_work+0x0/0xb0
  0    103         96        `-dst_gc_task+0x0/0x340
  0      0          0       cpuset:11
  0   1378       1378       khelper:12
  0   1378       1378        `-__call_usermodehelper+0x0/0xe0
  0      0          0       kintegrityd/0:178
  0    333        333       kblockd/0:184
  0    233        233        |-blk_unplug_work+0x0/0x110
  0    100        100        `-cfq_kick_queue+0x0/0x50
  0      0          0       kacpid:190
  0      0          0       kacpi_notify:191
  0      0          0       tifm:498
  0   8048       8048       ata/0:507
  0   8048       8048        `-ata_pio_task+0x0/0x280
  0      0          0       ata_aux:511
  0      0          0       aio/0:871
  0      0          0       crypto/0:897
  0      0          0       scsi_tgtd/0:2368
  0      0          0       iscsi_eh:2387
  0      0          0       kpsmoused:2521
  0      0          0       hd-audio0:2660
  0      0          0       krxrpcd/0:2745
  0     59         59       reiserfs/0:2895
  0     59         59        `-flush_async_commits+0x0/0x90
  1   5480       5503       events/1:10
  1      1          1        |-hpet_work+0x0/0x340
  1   2819       2818        |-vmstat_update+0x0/0x40
  1    464        464        |-console_callback+0x0/0x1e0
  1     80         80        |-test_work+0x0/0x64
  1      1          1        |-sysfs_add_func+0x0/0xb0
  1      8          8        |-lru_add_drain_per_cpu+0x0/0x10
  1      1          1        |-sync_cmos_clock+0x0/0x140
  1   1823       1823        |-fb_flashcursor+0x0/0x3c0
  1    282        282        |-flush_to_ldisc+0x0/0x2f0
  1      1          1        `-atkbd_event_work+0x0/0xb0
  1      0          0       kintegrityd/1:181
  1    254        254       kblockd/1:185
  1    148        148        |-blk_unplug_work+0x0/0x110
  1    106        106        `-cfq_kick_queue+0x0/0x50
  1  10988      10988       ata/1:509
  1  10988      10988        `-ata_pio_task+0x0/0x280
  1      0          0       aio/1:878
  1      0          0       crypto/1:898
  1      0          0       scsi_tgtd/1:2370
  1      0          0       krxrpcd/1:2746
  1     25         25       reiserfs/1:2896
  1     25         25        `-flush_async_commits+0x0/0x90


The trace has evolved by decomposing these frequency stats
per-worklet. Not only does it help for a driver writer to
instrument its own async jobs, but also it can help
any developer to find the culprit source of a latency
issued from a workqueue.

It can at least show a first direction of investigation, because
we are still only talking about frequency.
And speaking about incomplete informations, this output is only
a step before the main goal which is to provide the average and the
maximum time of execution for each worklet.


# CPU  INSERTED  EXECUTED  AVG TIME  MAX TIME	 TASKNAME:PID
# |      |         |          |         |        `-WORKFUNC
# |      |         |          |         |          |
  0   3822       3898        100 us    200us       events/0:9
  0      1          1        12 us     15us        |-hpet_work+0x0/0x340
  0     47         46        10 us     45us        |-rt_worker_func+0x0/0x400


With that we'll have much more indications to start a latency investigation.
But to get this end result, we needed the changes done on this latest patchset
from Zhaolei and Kosaki which, in the middle of other useful changes, turns
the workqueue_execution event into a new dichotomous capture with:

- trace_work_execute
- trace_work_complete

It was a required piece to perform the time of execution measurement.

Also, as I said, the statistical/histogram output is only the first step
of any worklet instrumentation. Once we focuse in a specific culprit,
we can trace all of its lifecycle using the trace events:
insertion, execution, completion, delayed insertion, cancelling, and so on.

All these events provide all the required fine grained instrumentation
to learn any worklet lifecycle details. Combining with well chosen events
toggling and event's fields filtering, we can focus on the only relevant
events at the time. 

Example of such trace:

# cd /debug/tracing

# ls events/workqueue/
filter          worklet_complete  worklet_enqueue_delayed  workqueue_creation     workqueue_flush
worklet_cancel  worklet_enqueue   worklet_execute          workqueue_destruction

# for e in $(ls events/workqueue); do echo 1 > events/workqueue/$e/enable; done

# tail -f trace

 events/1-10    [001]  4687.187314: worklet_execute: thread=events/1:10 work=ffff880001b27200 func=vmstat_update+0x0/0x40
 events/1-10    [001]  4687.187330: worklet_enqueue_delayed: thread=events/1:10 func=vmstat_update+0x0/0x40 cpu=-1 delay=250
 events/1-10    [001]  4687.187331: worklet_complete: thread=events/1:10 work=ffff880001b27200
 events/0-9     [000]  4687.250236: worklet_execute: thread=events/0:9 work=ffff880001b0c200 func=vmstat_update+0x0/0x40
 events/0-9     [000]  4687.250250: worklet_enqueue_delayed: thread=events/0:9 func=vmstat_update+0x0/0x40 cpu=-1 delay=237
 events/0-9     [000]  4687.250251: worklet_complete: thread=events/0:9 work=ffff880001b0c200

If more details are needed, such events could fit nicely inside with a
function graph tracer output.

So this latest patchset provides all these required informations on the events
tracing level. The next step is likely to be on the stat tracing to provide the avg/max time
of execution.


Thanks,
Frederic.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/