lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181106234633.GA11429@Asurada-Nvidia.nvidia.com>
Date:   Tue, 6 Nov 2018 15:46:35 -0800
From:   Nicolin Chen <nicoleotsuka@...il.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     joro@...tes.org, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

Hi Christoph,

On Sun, Nov 04, 2018 at 07:50:01AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 01, 2018 at 02:35:00PM -0700, Nicolin Chen wrote:
> > The __GFP_ZERO will be passed down to the generic page allocation
> > routine which zeros everything page by page. This is safe to be a
> > generic way but not efficient for iommu allocation that organizes
> > contiguous pages using scatterlist.
> > 
> > So this changes drops __GFP_ZERO from the flag, and adds a manual
> > memset after page/sg allocations, using the length of scatterlist.
> > 
> > My test result of a 2.5MB size allocation shows iommu_dma_alloc()
> > takes 46% less time, reduced from averagely 925 usec to 500 usec.
> 
> And in what case does dma_alloc_* performance even matter?

Honestly, this was amplified by running a local iommu benchmark
test. Practically dma_alloc/free() should not be that stressful,
but we cannot say the performance doesn't matter at all, right?
Though many device drivers pre-allocte memory for DMA usage, it
could matter where a driver dynamically allocates and releases.

And actually I have a related question for you: I saw that the
dma_direct_alloc() cancels the __GFP_ZERO flag and does manual
memset() after allocation. Might that be possibly related to a
performance concern? Though I don't see any performance keyword
for that part of code, especially seems that memset() was there
from the beginning.

Thanks
Nicolin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ