linux-kernel - Re: [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211004230003.GA2602856@nvidia.com>
Date:   Mon, 4 Oct 2021 20:00:03 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Leon Romanovsky <leon@...nel.org>
Cc:     Doug Ledford <dledford@...hat.com>,
        Aharon Landau <aharonl@...dia.com>,
        linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR
 cache pools when a pool is empty

On Sun, Sep 26, 2021 at 11:31:43AM +0300, Leon Romanovsky wrote:
> From: Aharon Landau <aharonl@...dia.com>
> 
> Currently, if a cache entry is empty, the driver will try to take MRs
> from larger cache entries. This behavior consumes a lot of memory.
> In addition, when searching for an mkey in an entry, the entry is locked.
> When using a multithreaded application with the old behavior, the threads
> will block each other more often, which can hurt performance as can be
> seen in the table below.
> 
> Therefore, avoid it by creating a new mkey when the requested cache entry
> is empty.
> 
> The test was performed on a machine with
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
> 
> Here are the time measures for allocating MRs of 2^6 pages. The search in
> the cache started from entry 6.
> 
> +------------+---------------------+---------------------+
> |            |     Old behavior    |     New behavior    |
> |            +----------+----------+----------+----------+
> |            | 1 thread | 5 thread | 1 thread | 5 thread |
> +============+==========+==========+==========+==========+
> |  1,000 MRs |   14 ms  |   30 ms  |   14 ms  |   80 ms  |
> +------------+----------+----------+----------+----------+
> | 10,000 MRs |  135 ms  |   6 sec  |  173 ms  |  880 ms  |
> +------------+----------+----------+----------+----------+
> |100,000 MRs | 11.2 sec |  57 sec  | 1.74 sec |  8.8 sec |
> +------------+----------+----------+----------+----------+
> 
> Signed-off-by: Aharon Landau <aharonl@...dia.com>
> Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> ---
>  drivers/infiniband/hw/mlx5/mr.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)

I'm surprised the cost is so high, I assume this has alot to do with
repeated calls to queue_adjust_cache_locked()? Maybe this should be
further investigated?

Anyhow, applied to for-next, thanks

Jason