linux-kernel - Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 6 Nov 2018 15:46:35 -0800
From:   Nicolin Chen <nicoleotsuka@...il.com>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     joro@...tes.org, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] iommu/dma: Zero pages manually in a length of scatterlist

Hi Christoph,

On Sun, Nov 04, 2018 at 07:50:01AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 01, 2018 at 02:35:00PM -0700, Nicolin Chen wrote:
> > The __GFP_ZERO will be passed down to the generic page allocation
> > routine which zeros everything page by page. This is safe to be a
> > generic way but not efficient for iommu allocation that organizes
> > contiguous pages using scatterlist.
> > 
> > So this changes drops __GFP_ZERO from the flag, and adds a manual
> > memset after page/sg allocations, using the length of scatterlist.
> > 
> > My test result of a 2.5MB size allocation shows iommu_dma_alloc()
> > takes 46% less time, reduced from averagely 925 usec to 500 usec.
> 
> And in what case does dma_alloc_* performance even matter?

Honestly, this was amplified by running a local iommu benchmark
test. Practically dma_alloc/free() should not be that stressful,
but we cannot say the performance doesn't matter at all, right?
Though many device drivers pre-allocte memory for DMA usage, it
could matter where a driver dynamically allocates and releases.

And actually I have a related question for you: I saw that the
dma_direct_alloc() cancels the __GFP_ZERO flag and does manual
memset() after allocation. Might that be possibly related to a
performance concern? Though I don't see any performance keyword
for that part of code, especially seems that memset() was there
from the beginning.

Thanks
Nicolin