[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bDVvdYuyuoHf==6KxYQqJBWcxQr0OC6BBk0UANuP4raGg@mail.gmail.com>
Date: Thu, 28 Jan 2021 21:06:23 -0500
From: Pavel Tatashin <pasha.tatashin@...een.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
Sasha Levin <sashal@...nel.org>,
Tyler Hicks <tyhicks@...ux.microsoft.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
Michal Hocko <mhocko@...e.com>,
Oscar Salvador <osalvador@...e.de>,
Vlastimil Babka <vbabka@...e.cz>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Jason Gunthorpe <jgg@...pe.ca>, Marc Zyngier <maz@...nel.org>,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Will Deacon <will.deacon@....com>,
James Morse <james.morse@....com>,
James Morris <jmorris@...ei.org>
Subject: Re: dax alignment problem on arm64 (and other achitectures)
> > Might be related to the broken custom pfn_valid() implementation for
> > ZONE_DEVICE.
> >
> > https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khandual@arm.com
> >
> > And essentially ignoring sub-section data in there for now as well (but
> > might not be that relevant yet). In addition, this might also be related to
> >
> > https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com
>
> I will check it, and see what I find. I saw that panic almost a year
> ago, things might have changed since then.
Hi David,
There is no panic anymore, but I also can't offset by 2M anymore, the
minimum that works now is 16M, and if alignment is less than 16M
creating devdax device fails.
So, I tried the new ARM64 patch that reduces section sizes, and two
alignments for pmem: regular 2G alignment, and 2G+16M alignment.
(subtracted 16M from the bottom)
***** 4K page, 6G RAM, 2G PRAM *****
BOOT:
40000000-1bfffffff : System RAM
1c0000000-23fffffff : namespace0.0
DEVDAX:
40000000-1bfffffff : System RAM
1c0000000-1c21fffff : namespace0.0
1c2200000-23fffffff : dax0.0
HOTPLUG:
40000000-1bfffffff : System RAM
1c0000000-1c21fffff : namespace0.0
1c8000000-23fffffff : dax0.0
1c8000000-23fffffff : System RAM (kmem) 128M Wasted (Expected)
***** 4K page, 6G-16M RAM, 2G+16M PRAM *****
BOOT:
40000000-1beffffff : System RAM
1bf000000-23fffffff : namespace0.0
DEVDAX:
40000000-1beffffff : System RAM
1bf000000-1c11fffff : namespace0.0
1c1200000-23fffffff : dax0.0
HOTPLUG:
40000000-1beffffff : System RAM
1bf000000-1c11fffff : namespace0.0
1c8000000-23fffffff : dax0.0
1c8000000-23fffffff : System RAM (kmem) 144M Wasted (????)
***** 64K page, 6G RAM, 2G PRAM *****
BOOT:
40000000-1bfffffff : System RAM
1c0000000-23fffffff : namespace0.0
DEVDAX:
40000000-1bfffffff : System RAM
1c0000000-1dfffffff : namespace0.0
1e0000000-23fffffff : dax0.0
HOTPLUG:
40000000-1bfffffff : System RAM
1c0000000-1dfffffff : namespace0.0
1e0000000-23fffffff : dax0.0
1e0000000-23fffffff : System RAM (kmem) 512M Wasted (Expected)
***** 64K page, 6G-16M RAM, 2G+16M PRAM *****
BOOT:
40000000-1beffffff : System RAM
1bf000000-23fffffff : namespace0.0
DEVDAX:
40000000-1beffffff : System RAM
1bf000000-1bf3fffff : namespace0.0
1bf400000-23fffffff : dax0.0
HOTPLUG:
40000000-1beffffff : System RAM
1bf000000-1bf3fffff : namespace0.0
1c0000000-23fffffff : dax0.0
1c0000000-23fffffff : System RAM (kmem) 16M Wasted (Optimal)
In all three cases only System RAM, namespace0.0, and dax0.0 were
printed from /proc/iomem.
BOOT content of iomem right after boot
DEVDAX content of iomem after devdax is created
ndctl create-namespace --mode devdax -e namespace0.0"
HOTPLUG content of imem after dax0.0 is hotplugged:
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
The most surprising part is why with 4K pages and 16M offset 144M is
wasted? For whatever reason, when devdax is created 34 goes wasted to
the label? Something is wrong here.. However, I am happy with 64K
pages result, and that only 16M is wasted, of course optimally, we
should be using any memory here, but it is still much better than what
we have now.
Pasha
Powered by blists - more mailing lists