lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 1 Mar 2021 13:23:06 +0200
From:   Shay Agroskin <shayagr@...zon.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
CC:     Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        Lorenzo Bianconi <lorenzo@...nel.org>, <bpf@...r.kernel.org>,
        <netdev@...r.kernel.org>, <davem@...emloft.net>, <kuba@...nel.org>,
        <ast@...nel.org>, <daniel@...earbox.net>, <toke@...hat.com>,
        <freysteinn.alfredsson@....se>, <john.fastabend@...il.com>,
        <jasowang@...hat.com>, <mst@...hat.com>,
        <thomas.petazzoni@...tlin.com>, <mw@...ihalf.com>,
        <linux@...linux.org.uk>, <ilias.apalodimas@...aro.org>,
        <netanel@...zon.com>, <akiyano@...zon.com>,
        <michael.chan@...adcom.com>, <madalin.bucur@....com>,
        <ioana.ciornei@....com>, <jesse.brandeburg@...el.com>,
        <anthony.l.nguyen@...el.com>, <saeedm@...dia.com>,
        <grygorii.strashko@...com>, <ecree.xilinx@...il.com>
Subject: Re: [PATCH v2 bpf-next] bpf: devmap: move drop error path to devmap
 for XDP_REDIRECT


Jesper Dangaard Brouer <brouer@...hat.com> writes:

> On Sun, 28 Feb 2021 23:27:25 +0100
> Lorenzo Bianconi <lorenzo.bianconi@...hat.com> wrote:
>
>> > >  	drops = bq->count - sent;
>> > > -out:
>> > > -	bq->count = 0;
>> > > +	if (unlikely(drops > 0)) {
>> > > +		/* If not all frames have been 
>> > > transmitted, it is our
>> > > +		 * responsibility to free them
>> > > +		 */
>> > > +		for (i = sent; i < bq->count; i++)
>> > > + 
>> > > xdp_return_frame_rx_napi(bq->q[i]);
>> > > +	}  
>> > 
>> > Wouldn't the logic above be the same even w/o the 'if' 
>> > condition ?  
>> 
>> it is just an optimization to avoid the for loop instruction if 
>> sent = bq->count
>
> True, and I like this optimization.
> It will affect how the code layout is (and thereby I-cache 
> usage).

I'm not sure what I-cache optimization you mean here. Compiling 
the following C code:

# define unlikely(x)	__builtin_expect(!!(x), 0)

extern void xdp_return_frame_rx_napi(int q);

struct bq_stuff {
    int q[4];
    int count;
};

int test(int sent, struct bq_stuff *bq) {
    int i;
    int drops;

    drops = bq->count - sent;
    if(unlikely(drops > 0))
        for (i = sent; i < bq->count; i++)
            xdp_return_frame_rx_napi(bq->q[i]);

    return 2;
}

with x86_64 gcc 10.2 with -O3 flag in https://godbolt.org/ (which 
provides the assembly code for different compilers) yields the 
following assembly:

test:
        mov     eax, DWORD PTR [rsi+16]
        mov     edx, eax
        sub     edx, edi
        test    edx, edx
        jg      .L10
.L6:
        mov     eax, 2
        ret
.L10:
        cmp     eax, edi
        jle     .L6
        push    rbp
        mov     rbp, rsi
        push    rbx
        movsx   rbx, edi
        sub     rsp, 8
.L3:
        mov     edi, DWORD PTR [rbp+0+rbx*4]
        add     rbx, 1
        call    xdp_return_frame_rx_napi
        cmp     DWORD PTR [rbp+16], ebx
        jg      .L3
        add     rsp, 8
        mov     eax, 2
        pop     rbx
        pop     rbp
        ret


When dropping the 'if' completely I get the following assembly 
output
test:
        cmp     edi, DWORD PTR [rsi+16]
        jge     .L6
        push    rbp
        mov     rbp, rsi
        push    rbx
        movsx   rbx, edi
        sub     rsp, 8
.L3:
        mov     edi, DWORD PTR [rbp+0+rbx*4]
        add     rbx, 1
        call    xdp_return_frame_rx_napi
        cmp     DWORD PTR [rbp+16], ebx
        jg      .L3
        add     rsp, 8
        mov     eax, 2
        pop     rbx
        pop     rbp
        ret
.L6:
        mov     eax, 2
        ret

which exits earlier from the function if 'drops > 0' compared to 
the original code (the 'for' loop looks a little different, but 
this shouldn't affect icache).

When removing the 'if' and surrounding the 'for' condition with 
'unlikely' statement:

for (i = sent; unlikely(i < bq->count); i++)

I get the following assembly code:

test:
        cmp     edi, DWORD PTR [rsi+16]
        jl      .L10
        mov     eax, 2
        ret
.L10:
        push    rbx
        movsx   rbx, edi
        sub     rsp, 16
.L3:
        mov     edi, DWORD PTR [rsi+rbx*4]
        mov     QWORD PTR [rsp+8], rsi
        add     rbx, 1
        call    xdp_return_frame_rx_napi
        mov     rsi, QWORD PTR [rsp+8]
        cmp     DWORD PTR [rsi+16], ebx
        jg      .L3
        add     rsp, 16
        mov     eax, 2
        pop     rbx
        ret

which is shorter than the other two (one line compared to the 
second and 7 lines compared the original code) and seems as 
optimized as the second.


I'm far from being an assembly expert, and I tested a code snippet 
I wrote myself rather than the kernel's code (for the sake of 
simplicity only).
Can you please elaborate on what makes the original 'if' essential 
(I took the time to do the assembly tests, please take the time on 
your side to prove your point, I'm not trying to be grumpy here).

Shay

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ