linux-kernel - Re: [PATCH] nvme-pci: fix host memory buffer allocation size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220510070356.GA11660@lst.de>
Date:   Tue, 10 May 2022 09:03:56 +0200
From:   Christoph Hellwig <hch@....de>
To:     Thomas Weißschuh <linux@...ssschuh.net>
Cc:     Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...nel.org>,
        Jens Axboe <axboe@...com>, Sagi Grimberg <sagi@...mberg.me>,
        linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org
Subject: Re: [PATCH] nvme-pci: fix host memory buffer allocation size

On Thu, Apr 28, 2022 at 06:09:11PM +0200, Thomas Weißschuh wrote:
> > > On my hardware we start with a chunk_size of 4MiB and just allocate
> > > 8 (hmmaxd) * 4 = 32 MiB which is worse than 1 * 200MiB.
> > 
> > And that is because the hardware only has a limited set of descriptors.
> 
> Wouldn't it make more sense then to allocate as much memory as possible for
> each descriptor that is available?
> 
> The comment in nvme_alloc_host_mem() tries to "start big".
> But it actually starts with at most 4MiB.

Compared to what other operating systems offer, that is quite large.

> And on devices that have hmminds > 4MiB the loop condition will never succeed
> at all and HMB will not be used.
> My fairly boring hardware already is at a hmminds of 3.3MiB.
> 
> > Is there any real problem you are fixing with this?  Do you actually
> > see a performance difference on a relevant workload?
> 
> I don't have a concrete problem or performance issue.
> During some debugging I stumbled in my kernel logs upon
> "nvme nvme0: allocated 32 MiB host memory buffer"
> and investigated why it was so low.

Until recently we could not even support these large sizes at all on
typical x86 configs.  With my fairly recent change to allow vmap
remapped iommu allocations on x86 we can do that now.  But if we
unconditionally enabled it I'd be a little worried about using too
much memory very easily.

We could look into removing the min with
PAGE_SIZE * MAX_ORDER_NR_PAGES to try to do larger segments for
"segment challenged" controllers now that it could work on a lot
of iommu enabled setups.  But I'd rather have a very good reason for
that.