[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211004230003.GA2602856@nvidia.com>
Date: Mon, 4 Oct 2021 20:00:03 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: Doug Ledford <dledford@...hat.com>,
Aharon Landau <aharonl@...dia.com>,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCH rdma-next] RDMA/mlx5: Avoid taking MRs from larger MR
cache pools when a pool is empty
On Sun, Sep 26, 2021 at 11:31:43AM +0300, Leon Romanovsky wrote:
> From: Aharon Landau <aharonl@...dia.com>
>
> Currently, if a cache entry is empty, the driver will try to take MRs
> from larger cache entries. This behavior consumes a lot of memory.
> In addition, when searching for an mkey in an entry, the entry is locked.
> When using a multithreaded application with the old behavior, the threads
> will block each other more often, which can hurt performance as can be
> seen in the table below.
>
> Therefore, avoid it by creating a new mkey when the requested cache entry
> is empty.
>
> The test was performed on a machine with
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores.
>
> Here are the time measures for allocating MRs of 2^6 pages. The search in
> the cache started from entry 6.
>
> +------------+---------------------+---------------------+
> | | Old behavior | New behavior |
> | +----------+----------+----------+----------+
> | | 1 thread | 5 thread | 1 thread | 5 thread |
> +============+==========+==========+==========+==========+
> | 1,000 MRs | 14 ms | 30 ms | 14 ms | 80 ms |
> +------------+----------+----------+----------+----------+
> | 10,000 MRs | 135 ms | 6 sec | 173 ms | 880 ms |
> +------------+----------+----------+----------+----------+
> |100,000 MRs | 11.2 sec | 57 sec | 1.74 sec | 8.8 sec |
> +------------+----------+----------+----------+----------+
>
> Signed-off-by: Aharon Landau <aharonl@...dia.com>
> Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> ---
> drivers/infiniband/hw/mlx5/mr.c | 26 +++++++++-----------------
> 1 file changed, 9 insertions(+), 17 deletions(-)
I'm surprised the cost is so high, I assume this has alot to do with
repeated calls to queue_adjust_cache_locked()? Maybe this should be
further investigated?
Anyhow, applied to for-next, thanks
Jason
Powered by blists - more mailing lists