netdev - Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iKvA6FBbN2o8dnisd_VFrFfm+WFqMFX50gdQCpRTdpX1Q@mail.gmail.com>
Date:   Tue, 7 Feb 2017 11:18:27 -0800
From:   Eric Dumazet <edumazet@...gle.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Tariq Toukan <ttoukan.linux@...il.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Martin KaFai Lau <kafai@...com>,
        Willem de Bruijn <willemb@...gle.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Brenden Blanco <bblanco@...mgrid.com>,
        Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling

On Tue, Feb 7, 2017 at 11:05 AM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Tue, Feb 07, 2017 at 08:26:23AM -0800, Eric Dumazet wrote:
>> On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:
>>
>
> Awesome that you've started working on this. I think it's correct approach
> and mlx5 should be cleaned up in similar way.
> Long term we should be able to move all page alloc/free out of the drivers
> completely.
>
>> >             /*
>> >              * make sure we read the CQE after we read the ownership bit
>> >              */
>> >             dma_rmb();
>> > +           prefetch(frags[0].page);
>>
>> Note that I would like to instead do a prefetch(frags[1].page)
>
> yeah, this two look weird:
> +               prefetch(frags[0].page);
> +               va = page_address(frags[0].page) + frags[0].page_offset;
>
> on most archs page_addres() is just math (not a load from memory),
> but the result != frags[0].page, so I'm missing what are you trying to prefetch?
>
> prefetch(frags[1].page)
> is even more confusing. what will it prefetch?


The "struct page"  of the following frame

Remember we need :

                 release = page_count(page) != 1 ||
                                page_is_pfmemalloc(page) ||
                                page_to_nid(page) != numa_mem_id();

Then :

page_ref_inc(page);


My patch now does :

prefetch(frags[priv->num_frags].page);

>
> btw we had a patch that was doing prefetch of 'va' of next packet
> and it was very helpful. Like this:

I preferred to fetch the second cache line of this frame,
because TCP is mostly used with timestamps : total of 66 bytes of
header with IPv4, and more for IPV6 of course.


>    pref_index = (index + 1) & ring->size_mask;
>    pref = ring->rx_info + (pref_index << priv->log_rx_info);
>    prefetch(page_address(pref->page) + pref->page_offset);
>
> but since you're redesigning rxing->rx_info... not sure how will it fit.
>
>> So I will probably change how ring->rx_info is allocated
>>
>> wasting all that space and forcing vmalloc() is silly :
>>
>> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
>>                                 sizeof(struct mlx4_en_rx_alloc));
>
> I think you'd still need roundup_pow_of_two otherwise priv->log_rx_info
> optimization won't work.

No more log_rx_info trick.

Simply : frags = priv->rx_info + (index * priv->rx_info_bytes_per_slot);

A multiply is damn fast these days compared to cache misses.

Using 24*<rx_ring_size> bytes is better than 32*<rx_ring_size>, our
L1/L2 caches are quite small.

Of course, this applies to the 'stress' mode, not the light mode where
we receive one single packet per IRQ.