netdev - Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce PERF_RECORD_BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <924AE46C-B2B9-4E17-A6FC-C678FEADC03B@fb.com>
Date:   Wed, 9 Jan 2019 11:32:50 +0000
From:   Song Liu <songliubraving@...com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     lkml <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "acme@...nel.org" <acme@...nel.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>,
        Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce
 PERF_RECORD_BPF_EVENT



> On Jan 9, 2019, at 2:18 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Tue, Jan 08, 2019 at 11:54:04PM +0000, Song Liu wrote:
> 
>> I think Intel PT case is at instruction granularity (instead of ksymbol
>> granularity)? 
> 
> Yes.
> 
>> If this is true, modules, BPF, and PT could still share
>> the ksymbol record for basic profiling. And advanced use cases like 
>> annotation will depend on user space to record BPF_EVENT (and equivalent
>> for other cases) timely. But at least, the ksymbol is already there. 
>> 
>> Does this make sense?  
> 
> I'm not sure I follow; the idea was that on ksym events we copy out the
> instructions using kcore. The ksym event already has addr+len.

I was thinking about modifying the text in-place scenario. In this case, 
we can use something like

struct perf_record_text_modify {
    u64 addr;
    u_big_enough old_instr;
    u_big_enough new_instr;
    timestamp ;
};

It is a fixed size record, and we don't need process it immediately 
in user space. At the end of perf run, a series of these events will 
help us reconstruct exact text at any time. 

> 
> All we need is some means of ensuring the symbol is still there by the
> time we see the event and do the copy.
> 
> I think we can do this with a new ioctl() on /proc/kcore itself:
> 
> - when we have kcore open, we queue all text-free operations on list-1.
> 
> - when we close kcore, we drain all (text-free) list-* and perform the
>   pending frees immediately.
> 
> - on ioctl(KCORE_QC) we perform the pending free of list-3 and advance
>   list-2 to list-3 and list-1 to list-2.
> 
> Perf would then open kcore at the start of the record, make a complete
> copy and keep the FD open. At the end of every buffer process, we issue
> KCORE_QC IFF we observed a ksym unreg in that buffer.

Does this mean we need to scan every buffer before writing it to perf.data 
during perf-record? 

Also, if we need ksym unreg here, I guess it is NOT really modifying text 
in-place, but creating new version and swap? Then can we include something 
like this in perf.data:

struct perf_record_text_modify {
    u64 old_addr;
    u64 new_addr;
    u32 old_len; /* up to MAX_SIZE */
    u32 new_len; /* up to MAX_SIZE */
    u8 old_text[MAX_SIZE];
    u8 new_text[MAX_SIZE];
    timestamp ;
};

In this way, this record is embedded in perf.data, and doesn't require
extra processing during perf-record (only at the end of perf-record). 
This would work for text modifying case, as modifying text is simply
old-text to new-text.
 
Similar solution would not work for BPF case, as bpf_prog_info is 
getting a lot more members in the near future. 

Does this make sense...?

Thanks,
Song