netdev - Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1489409689.28631.73.camel@edumazet-glaptop3.roam.corp.google.com>
Date:   Mon, 13 Mar 2017 05:54:49 -0700
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Tariq Toukan <ttoukan.linux@...il.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Alexander Duyck <alexander.duyck@...il.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [PATCH net-next] mlx4: Better use of order-0 pages in RX path

On Mon, 2017-03-13 at 14:01 +0200, Tariq Toukan wrote:

> I think MM-list people won't be happy with this.
> We were doing a similar thing with order-5 pages in mlx5 Striding RQ:
> Allocate and split high-order pages, to have:
> - Physically contiguous memory,
> - Less page allocations,
> - Yet, keep a fine grained refcounts/truesize.
> In case no high-order page available, fallback to using order-0 pages.
> 
> However, we changed this behavior, as it was fragmenting the memory, and 
> depleting the high-order pages available quickly.


Sure, I was not happy with this schem either.

I was the first to complain and suggest split_page() one year ago.

mlx5 was using __GFP_MEMALLOC for its MLX5_MPWRQ_WQE_PAGE_ORDER
allocations, and failure had no fallback.

mlx5e_alloc_rx_mpwqe() was simply giving up immediately.

Very different behavior there, since :

1) we normally recycle 99% [1] of the pages, and rx_alloc_order quickly
decreases under memory pressure.


2) My high order allocations use __GFP_NOMEMALLOC to cancel the
__GFP_MEMALLOC

3) Also note that I chose to periodically reset rx_alloc_order from
mlx4_en_recover_from_oom() to the initial value.
   We could later change this to a slow recovery if really needed, but
   my tests were fine with this.


[1] This driver might need to change the default RX ring sizes.
    1024 slots is a bit short for 40Gbit NIC these days.

    (We tune this to 4096)

Thanks !