lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.21.1904151348530.7076@planxty>
Date:   Mon, 15 Apr 2019 14:02:16 +0200 (CEST)
From:   John Kacur <jkacur@...hat.com>
To:     Stephen Rostedt <rostedt@...dmis.org>
cc:     linux-kernel@...r.kernel.org, Josef Bacik <jbacik@...com>
Subject: BUG: trace-cmd: trace-cmd record -e failure

This is a long standing but in trace-cmd.

[root@...-0 tmp]# trace-cmd record -p function -e sched:* sleep 1
  plugin 'function'
none
trace-cmd: Invalid argument
  Failed filter of 
/sys/kernel/debug/tracing/events/sched/sched_kthread_stop/filter

trace-cmd: No such file or directory
  can not stat 'trace.dat.cpu0'

-------------------------------------------

This only happens on large machines, I can only get it to trigger at 
around 200 cpus. However, it's not complicated, it's simply we're writing
to the filter file, one per recorder thread until the size of the string
exceeds the limit.

We're writing something crazy like this.

str = 
(common_pid!=50883)||(common_pid!=50882)||(common_pid!=50881)||(common_pid!=50880)||(common_pid!=50879)||(common_pid!=50878)||(common_pid!=50877)||(common_pid!=50876)||(common_pid!=50875)||(common_pid!=50874)||(common_pid!=50873)||(common_pid!=50872)||(common_pid!=50871)||(common_pid!=50870)||(common_pid!=50869)||(common_pid!=50868)||(common_pid!=50867)||(common_pid!=50866)||(common_pid!=50865)||(common_pid!=50864)||(common_pid!=50863)||(common_pid!=50862)||(common_pid!=50861)||(common_pid!=50860)||(common_pid!=50859)||(common_pid!=50858)||(common_pid!=50857)||(common_pid!=50856)||(common_pid!=50855)||(common_pid!=50854)||(common_pid!=50853)||(common_pid!=50852)||(common_pid!=50851)||(common_pid!=50850)||(common_pid!=50849)||(common_pid!=50848)||(common_pid!=50847)||(common_pid!=50846)||(common_pid!=50845)||(common_pid!=50844)||(common_pid!=50843)||(common_pid!=50842)||(common_pid!=50841)||(common_pid!=50840)||(common_pid!=50839)||(common_pid!=50838)||(common_pid!=50837)||(common_pid!=50836)||(common_pid!=50835)||(common_pid!=50834)||(common_pid!=50833)||(common_pid!=50832)||(common_pid!=50831)||(common_pid!=50830)||(common_pid!=50829)||(common_pid!=50828)||(common_pid!=50827)||(common_pid!=50826)||(common_pid!=50825)||(common_pid!=50824)||(common_pid!=50823)||(common_pid!=50822)||(common_pid!=50821)||(common_pid!=50820)||(common_pid!=50819)||(common_pid!=50818)||(common_pid!=50817)||(common_pid!=50816)||(common_pid!=50815)||(common_pid!=50814)||(common_pid!=50813)||(common_pid!=50812)||(common_pid!=50811)||(common_pid!=50810)||(common_pid!=50809)||(common_pid!=50808)||(common_pid!=50807)||(common_pid!=50806)||(common_pid!=50805)||(common_pid!=50804)||(common_pid!=50803)||(common_pid!=50802)||(common_pid!=50801)||(common_pid!=50800)||(common_pid!=50799)||(common_pid!=50798)||(common_pid!=50797)||(common_pid!=50796)||(common_pid!=50795)||(common_pid!=50794)||(common_pid!=50793)||(common_pid!=50792)||(common_pid!=50791)||(common_pid!=50790)||(common_pid!=50789)||(common_pid!=50788)||(common_pid!=50787)||(common_pid!=50786)||(common_pid!=50785)||(common_pid!=50784)||(common_pid!=50783)||(common_pid!=50782)||(common_pid!=50781)||(common_pid!=50780)||(common_pid!=50779)||(common_pid!=50778)||(common_pid!=50777)||(common_pid!=50776)||(common_pid!=50775)||(common_pid!=50774)||(common_pid!=50773)||(common_pid!=50772)||(common_pid!=50771)||(common_pid!=50770)||(common_pid!=50769)||(common_pid!=50768)||(common_pid!=50767)||(common_pid!=50766)||(common_pid!=50765)||(common_pid!=50764)||(common_pid!=50763)||(common_pid!=50762)||(common_pid!=50761)||(common_pid!=50760)||(common_pid!=50759)||(common_pid!=50758)||(common_pid!=50757)||(common_pid!=50756)||(common_pid!=50755)||(common_pid!=50754)||(common_pid!=50753)||(common_pid!=50752)||(common_pid!=50751)||(common_pid!=50750)||(common_pid!=50749)||(common_pid!=50748)||(common_pid!=50747)||(common_pid!=50746)||(common_pid!=50745)||(common_pid!=50744)||(common_pid!=50743)||(common_pid!=50742)||(common_pid!=50741)||(common_pid!=50740)||(common_pid!=50739)||(common_pid!=50738)||(common_pid!=50737)||(common_pid!=50736)||(common_pid!=50735)||(common_pid!=50734)||(common_pid!=50733)||(common_pid!=50732)||(common_pid!=50731)||(common_pid!=50730)||(common_pid!=50729)||(common_pid!=50728)||(common_pid!=50727)||(common_pid!=50726)||(common_pid!=50725)||(common_pid!=50724)||(common_pid!=50723)||(common_pid!=50722)||(common_pid!=50721)||(common_pid!=50720)||(common_pid!=50719)||(common_pid!=50718)||(common_pid!=50717)||(common_pid!=50716)||(common_pid!=50715)||(common_pid!=50714)||(common_pid!=50713)||(common_pid!=50712)||(common_pid!=50711)||(common_pid!=50710)||(common_pid!=50709)||(common_pid!=50708)||(common_pid!=50707)||(common_pid!=50706)||(common_pid!=50705)||(common_pid!=50704)||(common_pid!=50703)||(common_pid!=50702)||(common_pid!=50701)||(common_pid!=50700)||(common_pid!=50699)||(common_pid!=50698)||(common_pid!=50697)||(common_pid!=50696)||(common_pid!=50695)||(common_pid!=50694)||(common_pid!=50693)||(common_pid!=50692)||(common_pid!=50691)||(common_pid!=50690)||(common_pid!=50689)||(common_pid!=50688)||(common_pid!=50687)||(common_pid!=50686)||(common_pid!=50685)||(common_pid!=50684)||(common_pid!=50683)||(common_pid!=50682)||(common_pid!=50681)||(common_pid!=50680)||(common_pid!=50679)||(common_pid!=50678)||(common_pid!=50677)||(common_pid!=50676)||(common_pid!=50675)||(common_pid!=50674)||(common_pid!=50673)||(common_pid!=50672)||(common_pid!=50671)||(common_pid!=50670)||(common_pid!=50669)||(common_pid!=50668)||(common_pid!=50667)||(common_pid!=50666)||(common_pid!=50665)||(common_pid!=50664)||(common_pid!=50663)||(common_pid!=50662)||(common_pid!=50661)||(common_pid!=50660)||(common_pid!=50659)||(common_pid!=50658)||(common_pid!=50657)||(common_pid!=50656)||(common_pid!=50655)||(common_pid!=50654)||(common_pid!=50653)||(common_pid!=50652)||(common_pid!=50651)||(common_pid!=50650)||(common_pid!=50649)||(common_pid!=50648)||(common_pid!=50647)||(common_pid!=50646)||(common_pid!=50645)||(common_pid!=50644)||(common_pid!=50643)||(common_pid!=50642)||(common_pid!=50641)||(common_pid!=50640)||(common_pid!=50639)||(common_pid!=50638)||(common_pid!=50637)||(common_pid!=50636)||(common_pid!=50635)||(common_pid!=50634)||(common_pid!=50633)||(common_pid!=50632)||(common_pid!=50631)||(common_pid!=50630)||(common_pid!=50629)||(common_pid!=50628)||(common_pid!=50627)||(common_pid!=50626)||(common_pid!=50625)||(common_pid!=50624)||(common_pid!=50623)||(common_pid!=50622)||(common_pid!=50621)||(common_pid!=50620)||(common_pid!=50619)||(common_pid!=50618)||(common_pid!=50617)||(common_pid!=50616)||(common_pid!=50615)||(common_pid!=50614)||(common_pid!=50613)||(common_pid!=50612)||(common_pid!=50611)||(common_pid!=50610)||(common_pid!=50609)||(common_pid!=50608)||(common_pid!=50607)||(common_pid!=50606)||(common_pid!=50605)||(common_pid!=50604)||(common_pid!=50603)||(common_pid!=50602)||(common_pid!=50601)||(common_pid!=50600)||(common_pid!=50599)||(common_pid!=50598)||(common_pid!=50597)||(common_pid!=50596), 
strlen = 6046b

on an older version of traceback (but this bug still exists in the 
latest), I did a gdb session like this

Breakpoint 2, write_file (
    file=0x670e40 
"/sys/kernel/debug/tracing/events/sched/sched_kthread_stop/filter", 
    str=0x67f8c0 
"(common_pid!=21759)||(common_pid!=21758)||(common_pid!=21757)||(common_pid!=21756)||(common_pid!=21755)||(common_pid!=21754)||(common_pid!=21753)||(common_pid!=21752)||(common_pid!=21751)||(common_pid"..., 
    type=0x447dc4 "filter") at /root/trace-cmd/trace-record.c:1586
1586			die("Failed %s of %s\n", type, file);
(gdb) print file
$7 = 0x670e40 
"/sys/kernel/debug/tracing/events/sched/sched_kthread_stop/filter"
(gdb) bt
#0  write_file (
    file=0x670e40 
"/sys/kernel/debug/tracing/events/sched/sched_kthread_stop/filter", 
    str=0x67f8c0 
"(common_pid!=21759)||(common_pid!=21758)||(common_pid!=21757)||(common_pid!=21756)||(common_pid!=21755)||(common_pid!=21754)||(common_pid!=21753)||(common_pid!=21752)||(common_pid!=21751)||(common_pid"..., 
    type=0x447dc4 "filter") at /root/trace-cmd/trace-record.c:1586
#1  0x000000000040abfd in write_filter (
    file=0x670e40 
"/sys/kernel/debug/tracing/events/sched/sched_kthread_stop/filter", 
    filter=0x67f8c0 
"(common_pid!=21759)||(common_pid!=21758)||(common_pid!=21757)||(common_pid!=21756)||(common_pid!=21755)||(common_pid!=21754)||(common_pid!=21753)||(common_pid!=21752)||(common_pid!=21751)||(common_pid"...)
    at /root/trace-cmd/trace-record.c:1652
#2  0x000000000040af81 in update_event (event=0x670d30, 
    filter=0x67f8c0 
"(common_pid!=21759)||(common_pid!=21758)||(common_pid!=21757)||(common_pid!=21756)||(common_pid!=21755)||(common_pid!=21754)||(common_pid!=21753)||(common_pid!=21752)||(common_pid!=21751)||(common_pid"..., 
    filter_only=1, update=49 '1') at /root/trace-cmd/trace-record.c:1785
#3  0x000000000040b653 in update_event_filters (
    instance=0x65e5e0 <top_instance>) at 
/root/trace-cmd/trace-record.c:2016
#4  0x000000000040b83b in update_pid_event_filters (
    instance=0x65e5e0 <top_instance>) at 
/root/trace-cmd/trace-record.c:2081
#5  0x0000000000409607 in update_task_filter ()
    at /root/trace-cmd/trace-record.c:971
#6  0x00000000004118be in trace_record (argc=6, argv=0x7fffffffe0f8)
    at /root/trace-cmd/trace-record.c:4818
#7  0x0000000000407cb7 in main (argc=6, argv=0x7fffffffe0f8)
    at /root/trace-cmd/trace-cmd.c:133

Note even if you don't invoke the filter, you still write the sting to the 
file.

Not sure how to fix this, but conceptually it's not hard, we need to add 
some error checking and stop writing to the filter if we exceed the limit.
The functions that do the writes however are shared in many paths.

In addition to that, perhaps we could write something saner to the file
so that it takes longer to hit the limit.

Rewriting "common_pid" over and over again for example, is probably 
unnecessary with some reworking.

John Kacur

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ