[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJfuBxz7LXmOujY5giqZ96hKYm3cMUMmbf3DnEm9sJMzJg9UzA@mail.gmail.com>
Date: Tue, 2 Jan 2024 12:13:24 -0700
From: jim.cromie@...il.com
To: Łukasz Bartosik <lb@...ihalf.com>
Cc: Jason Baron <jbaron@...mai.com>, Andrew Morton <akpm@...ux-foundation.org>,
Kees Cook <keescook@...omium.org>, Douglas Anderson <dianders@...omium.org>,
Guenter Roeck <groeck@...gle.com>, Yaniv Tzoreff <yanivt@...gle.com>, Benson Leung <bleung@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>, Vincent Whitchurch <vincent.whitchurch@...s.com>,
Pekka Paalanen <ppaalanen@...il.com>, Sean Paul <seanpaul@...omium.org>,
Daniel Vetter <daniel@...ll.ch>, Simon Ser <contact@...rsion.fr>,
John Ogness <john.ogness@...utronix.de>, Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>, linux-kernel@...r.kernel.org,
upstream@...ihalf.com, Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [PATCH v3 00/22] dyndbg: add support for writing debug logs to trace
ok, to catch up here,
Lukas & I have been nudging the test script forward,
some of this is settled..
On Thu, Dec 28, 2023 at 5:00 PM Łukasz Bartosik <lb@...ihalf.com> wrote:
>
> śr., 27 gru 2023 o 05:23 <jim.cromie@...il.com> napisał(a):
> >
> > re-adding gregkh, who fell off the cc's again.
> > he is one of our upstreams.
> >
>
> hi Jim,
>
> It's good to hear from you.
>
> Thanks. I missed to add Greg.
>
> > On Fri, Dec 22, 2023 at 6:51 PM Łukasz Bartosik <lb@...ihalf.com> wrote:
> > >
> > > Add support for writing debug logs to trace events and trace instances.
> > > The rationale behing this feature is to be able to redirect debug logs
> > > (per each callsite individually) to trace to aid in debugging. The debug
> > > logs output to trace can be enabled with T flag. Additionally the T flag
> > > accepts trace destination which can be provided to the T flag after ":".
> > > The trace destination field is used to determine where debug logs will be
> > > written.
> > >
> > > In order to redirect callsite debug log to a trace instance it has to be
> > > initialized first with a newly introduced "open" command. For example,
> > > "usb" trace instance in <debugfs>/tracing/instances/ can be initialized
> > > by issuing the command:
> > >
> > > localhost ~ # echo "open usb" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > If the trace instance <debugfs>/tracing/instances/usb already exists
> > > then it will be reused otherwise new usbcore instance will be created.
> > > Maximum 63 trace instances can be "opened". The trace instance name can
> > > include any of ^\w+ and underscore characters. There is also a special
> > > name "0" reserved for writing debug logs to trace prdbg and devdbg events.
> > > The reserved name "0" does not require to be "opened" before use.
> >
> > theres implication here that p only works when :0.
> > Is that intended ?
> > it seems wrong, or at least unnecessary.
> >
>
> I don't see clearly the implication that p only works when :0.
> Which statement implies this to you ? Please keep in mind that English
> is not my native.
heh - Im not sure now - its mostly me probing for unseen inconsistencies.
> > In fact, theres no specific reason to exclude +p:trcbuf.mf,
> > esp if we keep default-trace-buf
> > since +pTfml could have that meaning implicitly.
> > We can call it allowed and "unimplemented until proven useful" on +p
> >
> > +p completely independent of +T is a simplicity,
> > lets keep that until proven useful otherwise.
> >
> > >
> > > To enable writing usbcore module debug logs to the "usb" trace instance
> > > opened above a user can issue the command:
> > >
> > > localhost ~ # echo "module usbcore =pT:usb" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > Please note that output of debug logs to syslog (p flag) and trace (T flag)
> > > can be independently enabled/disabled for each callsite. Therefore the
> > > above command will enable writing of debug logs not only to the trace
> > > instance "usb" but also to syslog.
> > >
> > > When trace destination is followed by another flag the next flag has to
> > > be preeceded with ".", for example
> > >
> > > localhost ~ # echo "module usbcore =pT:usb.l" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > To simplify processing trace destination can be omitted when default
> > > trace destination is set, for example the command
> > >
> > > localhost ~ # echo "module usbcore =pTl" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > will use default trace destination. If default trace destination is not
> > > set the above command will fail. The default trace destination is set
> > > when a command with any of the flags [-+=] and explicitly provided trace
> > > destination matches at least one callsite, for example the command
> > >
> > > localhost ~ # echo "module usbcore -T:usb" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > will disable trace for all usbcore module callsites (if any was enabled)
> > > and will set "usb" instance as default trace destination. To enable writing
> > > thunderbolt module debug logs to the "usb" trace instance as well a user
> > > can issue the command:
> > >
> > > localhost ~ # echo "module thunderbolt =T" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > Since default trace destination was previously set therefore "usb" instance
> > > name does not have to be explicitly provided.
> > >
> > > When no callsite writes debug logs to a trace instance then it can be
> > > released (its reference counter decrased) with the "close" command.
> > decreased
> >
>
> Ack
>
> > > Closing a trace instance make it unavailable for dynamic debug and also
> > makes
>
> Ack
>
> > > allows a user to delete such a trace instance at convenient time later
> > > with rmdir command. For example when "usb" instance is not used anymore
> > > a user can issue the command:
> > >
> > > localhost ~ # echo "close usb" >
> > > <debugfs>/dynamic_debug/control
> > > and then to delete it with:
> > >
> > > localhost ~ # rmdir <debugfs>/tracing/instances/
> > >
> > > To enable writing usbcore module debug logs to trace dyndbg:prdbg and
> > > dyndbg:devdbg events user can issue the command:
> > >
> > > localhost ~ # echo "module usbcore =T:0" >
> > > <debugfs>/dynamic_debug/control
> > >
> > > Then dyndbg trace events can be for example captured with the command:
> > >
> > > localhost ~ # trace-cmd start -e dyndbg
> > >
> > > And displayed with the command:
> > >
> > > localhost ~ # trace-cmd show
> > >
> >
> > is there a recipe to show private traces ? prolly worth a mention here.
> >
>
> Trace-cmd stat shows among other things open trace instances. I will add it.
>
> >
> > >
> > >
> > > Jim, I made a few changes to the test script dd-selftest.rc you created
> > > and I also added new test scenarios. You can find the entire patchset
> > > with the test script changes on top here
> > > https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/refs/sandbox/ukaszb/dyndbg_trace_v3
> > > I wonder where would be the proper place to store the test script
> > > (upstream, your github.com:jimc/linux.git)
> > > Do you have a suggestion for that ?
> > >
> >
> > I got it (your HEAD), renamed and put it in:
> > tools/testing/selftests/dynamic_debug/
> > and copied & modified Makefile and config from ../users/
> > I added your SOB, that seems proper.
> >
>
> Great, thanks ;)
>
> > nice additions. and I like the colors. maybe they belong in lib.mk,
> > but thats 4 later.
> >
> > and tweaked the modules / DUTs to *hopefully* get stable callsite counts
> > from the tests on them, in virtually any usable config.
> > DUTs: file init/main, module mm_init, and module test_dynamic_debug (a
> > config constraint)
> > The ref-counts are taken from a default virtme-ng -k config on my x86-64 box
> >
> > I now pass tests up to:
> >
> > [root@v6 wk-proof]# ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh
> > ...
> > # TEST_PRIVATE_TRACE_6
> > [ 1009.296557] dyndbg: read 3 bytes from userspace
> > [ 1009.296876] dyndbg: query 0: <=_> on module: <*>
> > [ 1009.297171] dyndbg: processed 1 queries, with 1559 matches, 0 errs
> > [ 1009.311621] dyndbg: read 39 bytes from userspace
> > [ 1009.312032] dyndbg: query 0: <open
> > A_bit_lengthy_trace_instance_name> on module: <*>
> > [ 1009.312569] dyndbg: instance is already opened
> > name:A_bit_lengthy_trace_instance_name
> > [ 1009.313063] dyndbg: processed 1 queries, with 0 matches, 1 errs
> > : ./tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh:454
> > ddcmd() expected to exit with code 0
> > Error: 'File exists'
> >
> > it seems invocation dependent
> > I suspect 2 possibilities:
> > 1- invoking as shell-script, repeatedly vs vng -e script-name
> > (one-shot per boot)
> > 2- virtme-ng's minimal-default config has no CONFIG_USB.
> > this changes post-conditions of the script, affecting preconditions
> > of the next run.
> >
>
> I guess you must have enabled CONFIG_USB in your setup otherwise
> dyndbg_selftest.sh should fail earlier in TEST_PRIVATE_TRACE_5 ?
>
Its usually on, but I have at least one config,
IIRC from default vng -vk, that left it off.
whats the shell syntax to check CONFIG_MODULES ?
ATM, basic-tests uses "module main" (maybe worth narrowing)
comma-terminator-tests uses "module params"
test-percent-splitting uses test_dynamic_debug to demo a useful case,
so it needs MODULES
so does the self-start-various-end tests
the private-trace tests appear to use a mix of
test_dynamic-debug, & module params,
maybe that can be materially improved, I dont know.
IIRC for builtins I think we're now using 'file kernel/
> > 2 means that we should replace usbcore with some other always *builtin* module.
> >
>
> Sounds good. Do you have a module candidate ?
>
> I'm also concerned that we might need to validate dyndbg_selftest.sh
> against specific kernel version because
> I took the dyndbg_selftest.sh from lukas-v3.0 branch and I had to do
> the following changes to make it work on kernel v6.6:
>
Yes, I hit this one too.
I decided to avoid using mm_init for this reason.
(without actually looking whether it was a config-based variance,
which I assumed, or a commit-based one)
I chose 'module params' instead,
despite the need to escape the brackets in the search:
check_match_ct '\[params\]' 4
> --- a/tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh
> +++ b/tools/testing/selftests/dynamic_debug/dyndbg_selftest.sh
> @@ -166,12 +166,12 @@ EOF
> function comma_terminator_tests {
> echo -e "${GREEN}# COMMA_TERMINATOR_TESTS ${NC}"
> # try combos of space & comma
> - check_match_ct mm_init 5
> + check_match_ct mm_init 4
> ddcmd module,mm_init,=_ # commas as spaces
> ddcmd module,mm_init,+mpf # turn on non-classed
> - check_match_ct =pmf 5
> + check_match_ct =pmf 4
> ddcmd ,module ,, , mm_init, -p # extra commas & spaces
> - check_match_ct =mf 5
> + check_match_ct =mf 4
> ddcmd =_
> }
> function test_percent_splitting {
> @@ -350,7 +350,7 @@ function test_private_trace_3 {
> ddcmd class D2_CORE +T:foo \; class D2_KMS +T:foo
> ddcmd "class,D2_CORE,+T:foo;,class,D2_KMS,+T:foo"
> ddcmd class,D2_CORE,+T:foo\;class,D2_KMS,+T:foo
> - check_match_ct =T 2 -v
> + check_match_ct =T 3 -v
> check_match_ct =Tl 0
> check_match_ct =Tmf 0
> echo 1 >/sys/kernel/tracing/tracing_on
> @@ -379,7 +379,7 @@ function test_private_trace_4 {
> modprobe test_dynamic_debug dyndbg=class,D2_CORE,+T:foo%class,D2_KMS,+T:foo
> check_match_ct =Tl 0
> check_match_ct =Tmf 0
> - check_match_ct =T 2
> + check_match_ct =T 3
> echo 1 >/sys/kernel/tracing/tracing_on
>
> What kernel version did you use for the testing ?
turns out Im on top of -rc7
(I suppose I should pay closer attention :-)
> > I chose module params, on the idea that its builtin, but also invoked later,
> > when modules are modprobed with params, so its callsites can be enabled &
> > then invoked via do_prints, to test for writes to the private trace-buf.
> >
> > I've pushed it to lukas-v3.0 at jimc.github.com/linux
> >
> > Would you look at err on _6, above,
> > and figure out if its insufficiently robust to test preconditions, or
> > something else ?
>
> The following change fixes failure in TEST_PRIVATE_TRACE_6:
> @@ -455,7 +455,7 @@ function test_private_trace_6 {
> ddcmd open $name
> is_trace_instance_opened $name
> check_trace_instance_dir $name 1
> - ddcmd "module module =T:$name.l" -v
> + ddcmd "module usbcore =T:$name.l"
> check_match_ct =T:A_bit_lengthy_trace....l 164
>
> When dyndbg_selftest.sh completes with success which is announced by the line
> # Done on: Thu Dec 28 23:04:17 CET 2023
> then it leaves dynamic debug and trace in a state where next
> dyndbg_selftest.sh run should also complete with success.
yes .this works well for vng -e <script-path> runs
> But when any of the tests fails manual intervention or reboot is
> required at the moment.
> Do you think we should have a cleanup function and call it whenever
> dyndbg_selftest.sh fails in order
> to prepare dynamic debug and trace for the next run of dyndbg_selftest.sh ?
>
yes, I think so.
In part to prove that we can "un-wedge" whatever's wrong w/o a reboot.
and/or a rmmod, modprobe cycle.
self_end_* are doing various unwinds,
but a full-powered cleanup fn, with diagnostics and hints
would be a good addition, with demonstration value.
> > and also:
> > rename test_private_trace_N() to better names
> > - and/or comments to intent & pattern of use
> > more tweaks invited in its commit-msg
> >
>
> I will rename the functions and add description of the tests.
>
> > since cycle_test_problem() is not actually a problem, whats a better name ?
> > _not_best_practices ?
> >
>
> Ack
>
> > I think the final test script enhancement needed is (maybe separately):
> > private-buffer-expected-write-verification after do_prints,
> > and around modprobe test_dynamic_debug dyndbg=$multicmd
> > subject to 2 below.
> >
>
> Ack
we definitely have this in part, spread around the dozen or so functions
>
> > > TODOs:
> > > - update dynamic debug documentation,
> > > - finalize discussion related to default trace destination
> > > (whether we want it at all, if yes then conclude on the details)
> > >
> >
> > good list. before docs;
> >
> > wrt default-trace, at least in some details, it seems extra rules to know.
> > for example, if the default were set on successful open,
> > rather than successful rule application (iirc), then it would read a lot
> > like "with" or "using" blocks, esp with <<EOX form
> >
> > with that new form, it might be pretty expressive:
> >
> > echo <<YMMV > /proc/dynamic_debug/control
> > open sesame # ie with
> > module magic_carpet function forward +Tfml
> > module magic_lamp function rub_vigorously +Tfmt
> > open voicetrace
> > module voice2ai function text2cmdprompt +T
> > module undo function oh_crap +T
> > YMMV
> >
I was hoping this example was stupid enough to goad someone
into responding (with a better one). Alas no.
How might DRM want to use this ?
one practical example is to segregate hi-rate drm_dbg events to a
separate buffer,
so theyre not cluttering your trace-cmd view until you want them.
But thats a little contrived, cuz vblank events are already trace-events
into the global buffer, they dont have drm_dbg's for this.
echo <<DRM_TRACE_BLK > /proc/dynamic_debug/control
open vblank_stream # segregate noisy/hi-vol stuff
module amdgpu format *_vblank +:vblank_stream # one
way to select them
module amdgpu format "[DML]:DML_" +:vblank_strem # pretty
much same set ?
open drm_generic
class DRM_CORE +:drm_generic.mf
class DRM_DRIVER_ +:drm_generic.tmf
DRM_TRACE_BLK
> > I think set-default-on-good-rule-applied is flawed:
> > it relies on the most complex expression in the grammar to be correct
> > and without spelling errors, where module unknown is
> > OK grammatically, but nonsense.
>
> in case of "unknown" module default trace destination will not be
> updated because
> the code checks for nfound before updating default destination:
> if (nfound)
> update_tr_default_dst(modifiers);
>
> > set-default-on-open is simpler to explain and test separately.
> >
>
> Actually I think we discussed set-default-on-open some time ago (in
> the beginning).
we could have - I miss a lot :-}
anyway youve named our choices:
> But to sum up the default trace destination there were 3 directions
> proposed/discussed:
> - set-default-on-open,
> - set-default-on-good-rule-applied,
note in my DRM_TRC_BLK example above, I made a subtle error.
DRM_DRIVER_ doesnt exist cuz of the trailing _.
non-matching keywords are silently skipped. (not just class)
the rule wont apply.
If Id misspelled it as DRM_DRIVERS it would be just as quiet.
So now the good/bad status creates an order-dependence;
if the rule is 1st in a set, good/bad chooses the new/old default-dest.
and this affects any following rule, unless that has an explicit dest.
This sounds complicated.
set-default-on-open doesnt have these problems:
each rule gets the same setting
> - preset-default-per-each-callsite with -T:trace_dest_name,
this doesnt have a default-setting, it keeps what it had
>
> All 3 pass information explicitly but IMHO
> preset-default-per-each-callsite is most confusing as user would have
> to remember what was preset for a given callsite when later passing a
> T flag for the same callsite.
>
thats true, but for:
We could display the current destination in control, independent of the flags
(with the :0. default elided)
init/main.c:1117 [main]initcall_blacklist =pT:startup_trc.mf
"blacklisting initcall %s\n"
init/main.c:1156 [main]initcall_blacklisted =T:startup_trc.mf
"initcall %s blacklisted\n"
init/main.c:1351 [main]run_init_process =:startup_trc.mf " with arguments:\n"
1st line writes to syslog and to private startup_trc, both with
"module:function:" dynamic-prefix
2nd only to the private trace, with same prefix
3rd is not enabled, but has its dest preset to the private trace,
again with the prefix.
And the part I glossed over:
a new "trace_name" keyword would allow selection on the composed set
of sites with that preset.
ddcmd trace_name drm_generic +Ts # enable the set which is
pointing at drm_generic
> Then if it comes to set-default-on-open vs
> set-default-on-good-rule-applied I don't have strong opinion but I
> would
> lean towards set-default-on-open because in this case it is pretty
> straightforward, I mean last successful open sets default trace
> destination.
of the 1st 2 options, I lean strongly to on-open.
To test the limits, what if we renamed "open" to "using" ?
using could:
a) force a dest update, from any previous state.
b) only upgrade from :0. dont change :foo.
a is unnecessary, since +T:dest. explicitly does that.
b is more subtle, but more useful.
>
> If you also hesitate which one to choose maybe we should start with
> very simple approach like that +T always defaults to trace events (:0)
> ?
that makes sense.
keep-presets-per-callsite does need keyword to be complete.
>
> > We should probably consider modifying the continue-looping behavior
> > of ddebug_exec_queries, depending upon the return-code.
> > open, close errors could reasonably warrant different treatment,
> > like terminating the mult-cmd loop prematurely on open/close errs.
> > Esp if we keep the set-default-on-good-rule-applied
> > What do you think ?
this is still open, and worth a review, but minor.
> Let's first conclude on default destination.
>
> > I'm inclined to reword a commit-msg or 2, drop RFC, obsolete comments etc.
> >
> > I can do that in a lukas-v3.1, or as in-thread responses,
> > do you have a preference ?
>
> Please go ahead and make changes in the lukas-v3.1 branch. I find it
> very convenient. I will just grab your changes from the branch.
excellent, it does feel pretty smooth.
is your repo also public, or did I just miss the clonable url?
I'll write some trace-thing for howto in the next few days,
(in particular, I'll stay out of dyndbg_selftest.sh, I think its your turn :-)
theres also s/coma/comma/ in the comma-separator patch.
thats all I got now,
Thanks!!
Jim
Powered by blists - more mailing lists