[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-9d460c8a-52e3-4a65-bd23-6210ad1fbf05@palmerdabbelt-glaptop>
Date: Wed, 22 Jan 2020 18:08:57 -0800 (PST)
From: Palmer Dabbelt <palmerdabbelt@...gle.com>
To: Bjorn Topel <bjorn.topel@...il.com>
CC: daniel@...earbox.net, ast@...nel.org, netdev@...r.kernel.org,
linux-riscv@...ts.infradead.org, lukenels@...washington.edu,
bpf@...r.kernel.org, xi.wang@...il.com
Subject: Re: [PATCH bpf-next v2 2/9] riscv, bpf: add support for far branching
On Tue, 07 Jan 2020 00:13:56 PST (-0800), Bjorn Topel wrote:
> Back from the holidays; Sorry about the delayed reply.
>
> On Mon, 23 Dec 2019 at 19:03, Palmer Dabbelt <palmerdabbelt@...gle.com> wrote:
>>
>> On Mon, 16 Dec 2019 01:13:36 PST (-0800), Bjorn Topel wrote:
>> > This commit adds branch relaxation to the BPF JIT, and with that
> [...]
>> > @@ -1557,6 +1569,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>> > {
>> > bool tmp_blinded = false, extra_pass = false;
>> > struct bpf_prog *tmp, *orig_prog = prog;
>> > + int pass = 0, prev_ninsns = 0, i;
>> > struct rv_jit_data *jit_data;
>> > struct rv_jit_context *ctx;
>> > unsigned int image_size;
>> > @@ -1596,15 +1609,25 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>> > prog = orig_prog;
>> > goto out_offset;
>> > }
>> > + for (i = 0; i < prog->len; i++) {
>> > + prev_ninsns += 32;
>> > + ctx->offset[i] = prev_ninsns;
>> > + }
>>
>> It feels like the first-order implementation is the same as binutils here: the
>> first round is worst cased, after which things can be more exact. We're only
>> doing one pass in binutils because most of the relaxation happens in the
>> linker, but this approach seems reasonable to me. I'd be interested in seeing
>> some benchmarks, as it may be worth relaxing these in the binutils linker as
>> well -- I can certainly come up with contrived test cases that aren't relaxed,
>> but I'm not sure how common this is.
>>
>
> Ah, interesting! Let me try to pull out some branch relaxation
> statistics/benchmarks for the BPF selftests.
>
>> My only worry is that that invariant should be more explicit. Specifically,
>> I'm thinking that every time offset is updated there should be some sort of
>> assertion that the offset is shrinking. This is enforced structurally in the
>> binutils code because we only generate code once and then move it around, but
>> since you're generating code every time it'd be easy for a bug to sneak in as
>> the JIT gets more complicated.
>>
>
> Hmm, yes. Maybe use a checksum for the program in addition to the
> length invariant, and converge condition would then be prev_len == len
> && prev_crc == crc?
That would work, but it breaks my unfinished optimization below. I was
thinking something more like "every time offset[i] is updated, check that it
gets smaller and otherwise barf".
>> Since most of the branches should be forward, you'll probably end up with way
>> fewer iterations if you do the optimization passes backwards.
>>
>
> Good idea!
>
>> > - /* First pass generates the ctx->offset, but does not emit an image. */
>> > - if (build_body(ctx, extra_pass)) {
>> > - prog = orig_prog;
>> > - goto out_offset;
>> > + for (i = 0; i < 16; i++) {
>> > + pass++;
>> > + ctx->ninsns = 0;
>> > + if (build_body(ctx, extra_pass)) {
>> > + prog = orig_prog;
>> > + goto out_offset;
>>
>> Isn't this returning a broken program if build_body() errors out the first time
>> through?
>>
>
> Hmm, care to elaborate? I don't see how?
Ya, I don't either any more. Hopefully I just got confused between prog and
ctx...
>> > + }
>> > + build_prologue(ctx);
>> > + ctx->epilogue_offset = ctx->ninsns;
>> > + build_epilogue(ctx);
>> > + if (ctx->ninsns == prev_ninsns)
>> > + break;
>> > + prev_ninsns = ctx->ninsns;
>>
>> IDK how important the performance of the JIT is, but you could probably get
>> away with skipping an iteration by keeping track of some simple metric that
>> determines if it would be possible to
>>
>
> ...to? Given that the programs are getting larger, performance of the
> JIT is important. So, any means the number of passes can be reduced is
> a good thing!
I guess I meant to say "determines if it would be possible to make any
modifications next time". I was thinking something along the lines of:
* as you run through the program, keep track of the shortest branch distance
* if you didn't remove enough bytes to make that branch cross a relaxation
boundary, then you know that next time you won't be able to do any useful
work
You're already computing all the branch lengths, so it's just an extra min().
Since we're assuming a small number of passes (after reversing the relaxation
direction), you'll probably save more work avoiding the extra pass than it'll
take to compute the extra information. I guess some sort of benchmark would
give a real answer, but it certainly smells like a good idea ;)
>
>
> Thanks for the review/suggestions!
> Björn
Powered by blists - more mailing lists