linux-kernel - Re: [PATCH 7/7] ring-buffer: Add lost event count to end of sub buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1270124616.19685.8829.camel@gandalf.stny.rr.com>
Date:	Thu, 01 Apr 2010 08:23:36 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 7/7] ring-buffer: Add lost event count to end of sub
 buffer

On Thu, 2010-04-01 at 08:13 +0200, Frederic Weisbecker wrote:
> On Wed, Mar 31, 2010 at 11:17:42PM -0400, Steven Rostedt wrote:

> If you don't have enough room to put lost_events in the end of
> the buffer, you could keep as is cpu_buffer->last_overrun,
> so the next time you swap the reader page, the new delta
> will be added to the previous that you couldn't transmit.
> 
> That way you ensure the user always know when you lose events.

When is not the issue since we set the MISSING_EVENTS flag already. It's
how many were lost at that time that we do not store.

I actually found in most cases there is room, and trace-cmd can report
it. When there is no room, it just says events were dropped without
reporting the actual number.


This is what it shows when the number can be stored:

       trace-cmd-4980  [002] 32784.303699: lock_acquire:         0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
       trace-cmd-4980  [002] 32784.303700: lock_release:         0xffff88003ebf0e08 &(&sbi->s_rsv_window_lock)->rlock
CPU:1 [415 EVENTS DROPPED]
       trace-cmd-4981  [001] 32784.303701: lock_release:         0xffff88000c49d7d0 &ei->truncate_mutex
       trace-cmd-4981  [001] 32784.303704: lock_acquire:         0xffffffff816591e0 read rcu_read_lock
       trace-cmd-4980  [002] 32784.303705: lock_acquire:         0xffff88003c0719d8 &(&bgl->locks[i].lock)->rlock

This is what it shows when the number can not:

          <idle>-0     [002] 32784.304459: lock_acquire:         0xffffffff816591e0 read rcu_read_lock
          <idle>-0     [002] 32784.304464: lock_release:         0xffffffff816591e0 rcu_read_lock
CPU:1 [EVENTS DROPPED]
              ls-4984  [001] 32784.304471: lock_acquire:         0xffff88003f8b8018 &(&base->lock)->rlock
              ls-4984  [001] 32784.304471: lock_release:         0xffff88003f8b8018 &(&base->lock)->rlock
              ls-4984  [001] 32784.304471: softirq_exit:         vec=1 [action=TIMER]


The problem with storing it later is that there is no guarantee that it
will be stored later. How far should the output look for that number. By
delaying when to print it, the output would need to search for that
number.

What happens if events were lost again before we found the number? Do we
add them together? Then we wont know how many events were lost between
the two spots where the events were lost. That is, if 448 events were
lost but we could not store the number, and then 318 events were lost,
and now we can store the number, do we add them together and report 766
events lost?

The end of the report still gives the total number lost, as it has
always done. Now I'm trying to report where they are lost.

Also note, in practice, the majority of the time we can record the
number:

  $ trace-cmd report | grep EVENTS
CPU:1 [82 EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:2 [75 EVENTS DROPPED]
CPU:1 [86 EVENTS DROPPED]
CPU:0 [87 EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [87 EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:1 [177 EVENTS DROPPED]
CPU:0 [78 EVENTS DROPPED]
CPU:0 [73 EVENTS DROPPED]
CPU:1 [163 EVENTS DROPPED]
CPU:2 [86 EVENTS DROPPED]
CPU:0 [75 EVENTS DROPPED]
CPU:2 [89 EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:0 [75 EVENTS DROPPED]
CPU:1 [EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [77 EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:0 [78 EVENTS DROPPED]
CPU:1 [90 EVENTS DROPPED]
CPU:2 [86 EVENTS DROPPED]
CPU:1 [87 EVENTS DROPPED]
CPU:2 [84 EVENTS DROPPED]
CPU:2 [82 EVENTS DROPPED]
CPU:1 [90 EVENTS DROPPED]
CPU:1 [89 EVENTS DROPPED]
CPU:2 [EVENTS DROPPED]
CPU:0 [EVENTS DROPPED]
CPU:2 [164 EVENTS DROPPED]
CPU:1 [EVENTS DROPPED]
CPU:0 [79 EVENTS DROPPED]
CPU:0 [89 EVENTS DROPPED]


If we really want this number every time, then we need to add it to the
sub buffer header, and not hope there was room to store it.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/