linux-kernel - Re: [PATCH] perf record: Add snapshot mode support for perf's regular events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <565579D4.2030108@intel.com>
Date:	Wed, 25 Nov 2015 11:05:24 +0200
From:	Adrian Hunter <adrian.hunter@...el.com>
To:	"Wangnan (F)" <wangnan0@...wei.com>
Cc:	Arnaldo Carvalho de Melo <acme@...nel.org>,
	David Ahern <dsahern@...il.com>,
	Yunlong Song <yunlong.song@...wei.com>, a.p.zijlstra@...llo.nl,
	paulus@...ba.org, mingo@...hat.com, linux-kernel@...r.kernel.org,
	namhyung@...nel.org, ast@...nel.org,
	masami.hiramatsu.pt@...achi.com, kan.liang@...el.com,
	jolsa@...nel.org, bp@...en8.de, jean.pihet@...aro.org,
	rric@...nel.org, xiakaixu@...wei.com, hekuang@...wei.com
Subject: Re: [PATCH] perf record: Add snapshot mode support for perf's regular
 events

On 25/11/15 10:43, Wangnan (F) wrote:
> 
> 
> On 2015/11/25 16:27, Adrian Hunter wrote:
>> On 25/11/15 09:47, Wangnan (F) wrote:
>>>
>>> On 2015/11/25 15:22, Adrian Hunter wrote:
>>>> On 25/11/15 05:50, Wangnan (F) wrote:
>>>>> On 2015/11/24 23:20, Arnaldo Carvalho de Melo wrote:
>>>>>> Em Tue, Nov 24, 2015 at 08:06:41AM -0700, David Ahern escreveu:
>>>>>>> On 11/24/15 7:00 AM, Yunlong Song wrote:
>>>>>>>> +static int record__write(struct record *rec, void *bf, size_t size)
>>>>>>>> +{
>>>>>>>> +    if (rec->memory.size && memory_enabled) {
>>>>>>>> +        if (perf_memory__write(&rec->memory, bf, size) < 0) {
>>>>>>>> +            pr_err("failed to write memory data, error: %m\n");
>>>>>>>> +            return -1;
>>>>>>>> +        }
>>>>>>>> +    } else {
>>>>>>>> +        if (perf_data_file__write(rec->session->file, bf, size) < 0) {
>>>>>>>> +            pr_err("failed to write perf data, error: %m\n");
>>>>>>>> +            return -1;
>>>>>>>> +        }
>>>>>>>> +        rec->bytes_written += size;
>>>>>>>>         }
>>>>>>>>
>>>>>>>> -    rec->bytes_written += size;
>>>>>>>>         return 0;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> @@ -86,6 +214,8 @@ static int record__mmap_read(struct record *rec, int
>>>>>>>> idx)
>>>>>>>>         if (old == head)
>>>>>>>>             return 0;
>>>>>>>>
>>>>>>>> +    memory_enabled = 1;
>>>>>>>> +
>>>>>>>>         rec->samples++;
>>>>>>>>
>>>>>>>>         size = head - old;
>>>>>>>> @@ -113,6 +243,7 @@ static int record__mmap_read(struct record *rec,
>>>>>>>> int
>>>>>>>> idx)
>>>>>>>>         md->prev = old;
>>>>>>>>         perf_evlist__mmap_consume(rec->evlist, idx);
>>>>>>>>     out:
>>>>>>>> +    memory_enabled = 0;
>>>>>>>>         return rc;
>>>>>>>>     }
>>>>>>>>
>>>>>>> So you are basically ignoring all samples until SIGUSR2 is received.
>>>>>>> That
>>>>>> No, he is not, its just that his code is difficult to follow, has to be
>>>>>> rewritten, but he is ignoring just PERF_RECORD_SAMPLE events, so it
>>>>>> will..
>>>>>>
>>>>>>> means the resulting data file will have limited history of task
>>>>>>> events for
>>>>>> ... have a complete history of task events, since PERF_RECORD_FORK, etc
>>>>>> are not being ignored.
>>>>>>
>>>>>> No?
>>>>> Actually we are discussing about this problem.
>>>>>
>>>>> For such tracking events (PERF_RECORD_FORK...), we have dummy event so
>>>>> it is possible for us to receive tracking events from a separated
>>>>> channel, therefore we don't have to parse every events to pick those
>>>>> events out. Instead, we can process tracking events differently, then
>>>>> more interesting things can be done. For example, squashing those tracking
>>>>> events if it takes too much memory...
>>>>>
>>>>> Furthermore, there's another problem being discussed: if userspace
>>>>> ringbuffer
>>>>> is bytes based, parsing event is unavoidable. Without parsing event we are
>>>>> unable to find the new 'head' pointer when overwriting.
>>>> Have you considered trying to find the head by trial-and-error at the time
>>>> you make the snapshot i.e. look at the first 8 bytes (event records are 8
>>>> byte aligned) and see if it is a valid record header, if not try the next 8
>>>> bytes.  When you find a real event record it should parse without error and
>>>> the subsequent events should all parse without error too, all the way to
>>>> the
>>>> tail.  Then you can use timestamps and compare the events byte-by-byte to
>>>> avoid overlaps between 2 snapshots.
>>> It seems not work. Now we have BPF output event, it is possible that a
>>> BPF program output anything through that event. Even if we have a magic
>>> in head of each event, we can't prevent BPF output event output that
>>> magic, except we introduce some 'escape' method to prevent BPF output
>>> event output some data pattern. So although might work in reallife,
>>> this solution is logically incorrect. Or am I miss someting?
>> When you find the head, all the events will parse correctly.  It seems to me
>> highly unlikely that would happen if you guessed the head wrongly.
>> It is only incorrect if it gives the wrong result.
> 
> Right, so I said it might work in reallife. However, I think we
> should better to try to provide some logically correct solution.
> Also, 'guessing' means some sort of intelligence, or how do we
> deal with guessing error? Simply drop them?

It is not "intelligence" it is a linear search.  If it gives more than one
answer, it is a fatal error.  You can mitigate that by adding more
validation of the event records.

But it is only a suggestion.

> And what's your opinion on the bucket besed ring buffer? With that
> design we only need to maintain a ringbuffer of pointers. It should
> be much simpler. The only drawback I can image is the waste of memory
> because we have to alloc buckets pessimistically. Do you think
> that method have other problem I haven't considered?

The drawback is that you have to copy all the events all the time instead of
letting the kernel ring buffer wraparound without any userspace involvement
until you make a snapshot.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/