lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1520027052.4592.60.camel@kernel.crashing.org>
Date:   Sat, 03 Mar 2018 08:44:12 +1100
From:   Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:     Logan Gunthorpe <logang@...tatee.com>,
        Dan Williams <dan.j.williams@...el.com>
Cc:     Jens Axboe <axboe@...nel.dk>, Keith Busch <keith.busch@...el.com>,
        Oliver OHalloran <oliveroh@....ibm.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>,
        linux-rdma <linux-rdma@...r.kernel.org>,
        linux-pci@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-nvme@...ts.infradead.org, linux-block@...r.kernel.org,
        Jérôme Glisse <jglisse@...hat.com>,
        Jason Gunthorpe <jgg@...lanox.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Max Gurtovoy <maxg@...lanox.com>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI
 Memory

On Fri, 2018-03-02 at 10:25 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2018-03-01 at 16:19 -0700, Logan Gunthorpe wrote:
> > 
> > On 01/03/18 04:00 PM, Benjamin Herrenschmidt wrote:
> > > We use only 52 in practice but yes.
> > > 
> > > >   That's 64PB. If you use need
> > > > a sparse vmemmap for the entire space it will take 16TB which leaves you
> > > > with 63.98PB of address space left. (Similar calculations for other
> > > > numbers of address bits.)
> > > 
> > > We only have 52 bits of virtual space for the kernel with the radix
> > > MMU.
> > 
> > Ok, assuming you only have 52 bits of physical address space: the sparse 
> > vmemmap takes 1TB and you're left with 3.9PB of address space for other 
> > things. So, again, why doesn't that work? Is my math wrong
> 
> The big problem is not the vmemmap, it's the linear mapping

Allright, so, I think I have a plan to fix this, but it will take a
little bit of time.

Basically the idea is to have firmware pass to Linux a region that's
known to not have anything in it that it can use for the vmalloc space
rather than have linux arbitrarily cut the address space in half.

I'm pretty sure I can always find large enough "holes" in the physical
address space that are outside of both RAM/OpenCAPI/Nvlink and
PCIe/MMIO space. If anything, unused chip IDs. But I don't want Linux
to have to know about the intimate HW details so I'll pass it from FW.

It will take some time to adjust Linux and get updated FW around
though.

Once that's done, I'll be able to have the linear mapping go through
the entire 52-bit space (minus that hole). Of course the hole need to
be large enough to hold a vmemmap for a 52-bit space, so that's about
4TB. So I probably need a hole that's at least 8TB.

As for the mapping attributes, it should be easy for my linear mapping
code to ensure anything that isn't actual RAM is mapped NC.

Cheers,
Ben.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ