lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YpE8Jx3cBwgXJnRc@kbusch-mbp.dhcp.thefacebook.com>
Date:   Fri, 27 May 2022 15:01:27 -0600
From:   Keith Busch <kbusch@...nel.org>
To:     Tony Battersby <tonyb@...ernetics.com>
Cc:     kernel-team@...com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, willy@...radead.org
Subject: Re: [PATCH 0/2] dmapool performance enhancements

On Fri, May 27, 2022 at 03:35:47PM -0400, Tony Battersby wrote:
> I posted a similar patch series back in 2018:
> 
> https://lore.kernel.org/linux-mm/73ec1f52-d758-05df-fb6a-41d269e910d0@cybernetics.com/
> https://lore.kernel.org/linux-mm/15ff502d-d840-1003-6c45-bc17f0d81262@cybernetics.com/
> https://lore.kernel.org/linux-mm/1288e597-a67a-25b3-b7c6-db883ca67a25@cybernetics.com/
> 
> 
> I initially used a red-black tree keyed by the DMA address, but then for
> v2 of the patchset I put the dma pool info directly into struct page and
> used virt_to_page() to get at it.  But it turned out that was a bad idea
> because not all architectures have struct page backing
> dma_alloc_coherent():
> 
> https://lore.kernel.org/linux-kernel/20181206013054.GI6707@atomide.com/
> 
> I intended to go back and resubmit the red-black tree version, but I was
> too busy at the time and forgot about it.  A few days ago I finally
> decided to update the patches and submit them upstream.  I found your
> recent dmapool xarray patches by searching the mailing list archive to
> see if anyone else was working on something similar.
> 
> Using the following as a benchmark:
> 
> modprobe mpt3sas
> drivers/scsi/mpt3sas/mpt3sas_base.c
> _base_allocate_chain_dma_pool
> loop dma_pool_alloc(ioc->chain_dma_pool)
> 
> rmmod mpt3sas
> drivers/scsi/mpt3sas/mpt3sas_base.c
> _base_release_memory_pools()
> loop dma_pool_free(ioc->chain_dma_pool)
> 
> Here are the benchmark results showing the speedup from the patchsets:
> 
>         modprobe  rmmod
> orig          1x     1x
> xarray      5.2x   186x
> rbtree      9.3x   269x
> 
> It looks like my red-black tree version is faster than the v1 of the
> xarray patch on this benchmark at least, although the mpt3sas usage of
> dmapool is hardly typical.  I will try to get some testing done on my
> patchset and post it next week.

Thanks for the info.

Just comparing with xarray, I actually found that the list was still faster
until you get >100 pages in the pool, after which xarray becomes the clear
winner.

But it turns out I don't often see that many pages allocated for a lot of real
use cases, so I'm trying to take this in a different direction by replacing the
lookup structures with an intrusive stack. That is safe to do since pages are
never freed for the lifetime of the pool, and it's by far faster than anything
else. The downside is that I'd need to increase the size of the smallest
allowable pool block, but I think that's okay.

Anyway I was planning to post this new idea soon. My reasons for wanting a
faster dma pool are still in the works, though, so I'm just trying to sort out
those patches before returning to this one.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ