linux-kernel - Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce PERF_RECORD_BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190109125938.GJ1900@hirez.programming.kicks-ass.net>
Date:   Wed, 9 Jan 2019 13:59:38 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Song Liu <songliubraving@...com>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "acme@...nel.org" <acme@...nel.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>,
        Andi Kleen <andi@...stfloor.org>
Subject: Re: [PATCH v5 perf, bpf-next 3/7] perf, bpf: introduce
 PERF_RECORD_BPF_EVENT

On Wed, Jan 09, 2019 at 11:32:50AM +0000, Song Liu wrote:
> I was thinking about modifying the text in-place scenario. In this case, 
> we can use something like
> 
> struct perf_record_text_modify {
>     u64 addr;
>     u_big_enough old_instr;
>     u_big_enough new_instr;

char[15] for x86 ;-)

Also, I don't think we need old, we should already have the old text,
either from a previous event or from the initial kcore snapshot.

>     timestamp ;

that lives in struct sample_id.

> };
> 
> It is a fixed size record, and we don't need process it immediately 
> in user space. At the end of perf run, a series of these events will 
> help us reconstruct exact text at any time. 

That works for text_poke users, see also:

  https://lkml.kernel.org/r/20190109103544.GH1900@hirez.programming.kicks-ass.net

But is useless for module / bpf / ftrace dynamic text.

> > All we need is some means of ensuring the symbol is still there by the
> > time we see the event and do the copy.
> > 
> > I think we can do this with a new ioctl() on /proc/kcore itself:
> > 
> > - when we have kcore open, we queue all text-free operations on list-1.
> > 
> > - when we close kcore, we drain all (text-free) list-* and perform the
> >   pending frees immediately.
> > 
> > - on ioctl(KCORE_QC) we perform the pending free of list-3 and advance
> >   list-2 to list-3 and list-1 to list-2.
> > 
> > Perf would then open kcore at the start of the record, make a complete
> > copy and keep the FD open. At the end of every buffer process, we issue
> > KCORE_QC IFF we observed a ksym unreg in that buffer.
> 
> Does this mean we need to scan every buffer before writing it to perf.data 
> during perf-record? 

Just like the BPF events, yes. Now for PT most of the actual data is not
in the regular buffer, so it shouldn't be too horrible, but just like
the BPF event, it can get its own buffer if it does become a problem.

> Also, if we need ksym unreg here, I guess it is NOT really modifying text 
> in-place, but creating new version and swap? Then can we include something 
> like this in perf.data:
> 
> struct perf_record_text_modify {
>     u64 old_addr;
>     u64 new_addr;
>     u32 old_len; /* up to MAX_SIZE */
>     u32 new_len; /* up to MAX_SIZE */
>     u8 old_text[MAX_SIZE];
>     u8 new_text[MAX_SIZE];
>     timestamp ;
> };
> 
> In this way, this record is embedded in perf.data, and doesn't require
> extra processing during perf-record (only at the end of perf-record). 
> This would work for text modifying case, as modifying text is simply
> old-text to new-text.
>  
> Similar solution would not work for BPF case, as bpf_prog_info is 
> getting a lot more members in the near future. 
> 
> Does this make sense...?

I don't think we actually need old_text here either. We're creating a
new text mapping, there was nothing there before.

But still, perf events are limited to 64k, so that means we cannot
support symbols larger than that (although I suppose that would be
fairly rare).

Something like that could work, but I'm not sure it is actually better.
Some PT person would have to play with things I suppose.