[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YD5ZqzIa5TymNdB4@lore-desk>
Date: Tue, 2 Mar 2021 16:28:43 +0100
From: Lorenzo Bianconi <lorenzo@...nel.org>
To: Jesper Dangaard Brouer <brouer@...hat.com>
Cc: Shay Agroskin <shayagr@...zon.com>,
Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
bpf@...r.kernel.org, netdev@...r.kernel.org, davem@...emloft.net,
kuba@...nel.org, ast@...nel.org, daniel@...earbox.net,
toke@...hat.com, freysteinn.alfredsson@....se,
john.fastabend@...il.com, jasowang@...hat.com, mst@...hat.com,
thomas.petazzoni@...tlin.com, mw@...ihalf.com,
linux@...linux.org.uk, ilias.apalodimas@...aro.org,
netanel@...zon.com, akiyano@...zon.com, michael.chan@...adcom.com,
madalin.bucur@....com, ioana.ciornei@....com,
jesse.brandeburg@...el.com, anthony.l.nguyen@...el.com,
saeedm@...dia.com, grygorii.strashko@...com, ecree.xilinx@...il.com
Subject: Re: [PATCH v2 bpf-next] bpf: devmap: move drop error path to devmap
for XDP_REDIRECT
> On Mon, 1 Mar 2021 13:23:06 +0200
> Shay Agroskin <shayagr@...zon.com> wrote:
>
> > Jesper Dangaard Brouer <brouer@...hat.com> writes:
> >
> > > On Sun, 28 Feb 2021 23:27:25 +0100
> > > Lorenzo Bianconi <lorenzo.bianconi@...hat.com> wrote:
> > >
> > >> > > drops = bq->count - sent;
> > >> > > -out:
> > >> > > - bq->count = 0;
> > >> > > + if (unlikely(drops > 0)) {
> > >> > > + /* If not all frames have been
> > >> > > transmitted, it is our
> > >> > > + * responsibility to free them
> > >> > > + */
> > >> > > + for (i = sent; i < bq->count; i++)
> > >> > > +
> > >> > > xdp_return_frame_rx_napi(bq->q[i]);
> > >> > > + }
> > >> >
> > >> > Wouldn't the logic above be the same even w/o the 'if'
> > >> > condition ?
> > >>
> > >> it is just an optimization to avoid the for loop instruction if
> > >> sent = bq->count
> > >
> > > True, and I like this optimization.
> > > It will affect how the code layout is (and thereby I-cache
> > > usage).
> >
> > I'm not sure what I-cache optimization you mean here. Compiling
> > the following C code:
> >
> > # define unlikely(x) __builtin_expect(!!(x), 0)
> >
> > extern void xdp_return_frame_rx_napi(int q);
> >
> > struct bq_stuff {
> > int q[4];
> > int count;
> > };
> >
> > int test(int sent, struct bq_stuff *bq) {
> > int i;
> > int drops;
> >
> > drops = bq->count - sent;
> > if(unlikely(drops > 0))
> > for (i = sent; i < bq->count; i++)
> > xdp_return_frame_rx_napi(bq->q[i]);
> >
> > return 2;
> > }
> >
> > with x86_64 gcc 10.2 with -O3 flag in https://godbolt.org/ (which
> > provides the assembly code for different compilers) yields the
> > following assembly:
> >
> > test:
> > mov eax, DWORD PTR [rsi+16]
> > mov edx, eax
> > sub edx, edi
> > test edx, edx
> > jg .L10
> > .L6:
> > mov eax, 2
> > ret
>
> This exactly shows my point. Notice how 'ret' happens earlier in this
> function. This is the common case, thus the CPU don't have to load the
> asm instruction below.
>
> > .L10:
> > cmp eax, edi
> > jle .L6
> > push rbp
> > mov rbp, rsi
> > push rbx
> > movsx rbx, edi
> > sub rsp, 8
> > .L3:
> > mov edi, DWORD PTR [rbp+0+rbx*4]
> > add rbx, 1
> > call xdp_return_frame_rx_napi
> > cmp DWORD PTR [rbp+16], ebx
> > jg .L3
> > add rsp, 8
> > mov eax, 2
> > pop rbx
> > pop rbp
> > ret
> >
> >
> > When dropping the 'if' completely I get the following assembly
> > output
> > test:
> > cmp edi, DWORD PTR [rsi+16]
> > jge .L6
>
> Jump to .L6 which is the common case. The code in between is not used
> in common case, but the CPU will likely load this into I-cache, and
> then jumps over the code in common case.
>
> > push rbp
> > mov rbp, rsi
> > push rbx
> > movsx rbx, edi
> > sub rsp, 8
> > .L3:
> > mov edi, DWORD PTR [rbp+0+rbx*4]
> > add rbx, 1
> > call xdp_return_frame_rx_napi
> > cmp DWORD PTR [rbp+16], ebx
> > jg .L3
> > add rsp, 8
> > mov eax, 2
> > pop rbx
> > pop rbp
> > ret
> > .L6:
> > mov eax, 2
> > ret
> >
> > which exits earlier from the function if 'drops > 0' compared to
> > the original code (the 'for' loop looks a little different, but
> > this shouldn't affect icache).
> >
> > When removing the 'if' and surrounding the 'for' condition with
> > 'unlikely' statement:
> >
> > for (i = sent; unlikely(i < bq->count); i++)
> >
> > I get the following assembly code:
> >
> > test:
> > cmp edi, DWORD PTR [rsi+16]
> > jl .L10
> > mov eax, 2
> > ret
> > .L10:
> > push rbx
> > movsx rbx, edi
> > sub rsp, 16
> > .L3:
> > mov edi, DWORD PTR [rsi+rbx*4]
> > mov QWORD PTR [rsp+8], rsi
> > add rbx, 1
> > call xdp_return_frame_rx_napi
> > mov rsi, QWORD PTR [rsp+8]
> > cmp DWORD PTR [rsi+16], ebx
> > jg .L3
> > add rsp, 16
> > mov eax, 2
> > pop rbx
> > ret
> >
> > which is shorter than the other two (one line compared to the
> > second and 7 lines compared the original code) and seems as
> > optimized as the second.
>
> You are also using unlikely() and get the earlier return, with less
> instructions, which is great. Perhaps we can use this type of
> unlikely() in the for-statement? WDYT Lorenzo?
sure, we can do it..I will address it in v3. Thanks.
Regards,
Lorenzo
>
>
> > I'm far from being an assembly expert, and I tested a code snippet
> > I wrote myself rather than the kernel's code (for the sake of
> > simplicity only).
> > Can you please elaborate on what makes the original 'if' essential
> > (I took the time to do the assembly tests, please take the time on
> > your side to prove your point, I'm not trying to be grumpy here).
> >
> > Shay
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
>
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists