lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 23 Feb 2008 23:25:13 +0800 From: Matt Mackall <mpm@...enic.com> To: Andrew Morton <akpm@...ux-foundation.org> Cc: Hans Rosenfeld <hans.rosenfeld@....com>, linux-kernel@...r.kernel.org Subject: Re: [RFC][PATCH] make /proc/pid/pagemap work with huge pages and return page size On Sat, 2008-02-23 at 00:06 -0800, Andrew Morton wrote: > On Wed, 20 Feb 2008 14:57:43 +0100 "Hans Rosenfeld" <hans.rosenfeld@....com> wrote: > > > The current code for /proc/pid/pagemap does not work with huge pages (on > > x86). The code will make no difference between a normal pmd and a huge > > page pmd, trying to parse the contents of the huge page as ptes. Another > > problem is that there is no way to get information about the page size a > > specific mapping uses. > > > > Also, the current way the "not present" and "swap" bits are encoded in > > the returned pfn isn't very clean, especially not if this interface is > > going to be extended. > > > > I propose to change /proc/pid/pagemap to return a pseudo-pte instead of > > just a raw pfn. The pseudo-pte will contain: > > > > - 58 bits for the physical address of the first byte in the page, even > > less bits would probably be sufficient for quite a while > > > > - 4 bits for the page size, with 0 meaning native page size (4k on x86, > > 8k on alpha, ...) and values 1-15 being specific to the architecture > > (I used 1 for 2M, 2 for 4M and 3 for 1G for x86) > > > > - a "swap" bit indicating that a not present page is paged out, with the > > physical address field containing page file number and block number > > just like before > > > > - a "present" bit just like in a real pte > > > > By shortening the field for the physical address, some more interesting > > information could be included, like read/write permissions and the like. > > The page size could also be returned directly, 6 bits could be used to > > express any page shift in a 64 bit system, but I found the encoded page > > size more useful for my specific use case. > > > > > > The attached patch changes the /proc/pid/pagemap code to use such a > > pseudo-pte. The huge page handling is currently limited to 2M/4M pages > > on x86, 1G pages will need some more work. To keep the simple mapping of > > virtual addresses to file index intact, any huge page pseudo-pte is > > replicated in the user buffer to map the equivalent range of small > > pages. > > > > Note that I had to move the pmd_pfn() macro from asm-x86/pgtable_64.h to > > asm-x86/pgtable.h, it applies to both 32 bit and 64 bit x86. > > > > Other architectures will probably need other changes to support huge > > pages and return the page size. > > > > I think that the definition of the pseudo-pte structure and the page > > size codes should be made available through a header file, but I didn't > > do this for now. > > > > If we're going to do this, we need to do it *fast*. Once 2.6.25 goes out > our hands are tied. > > That means talking with the maintainers of other hugepage-capable > architectures. > > > +struct ppte { > > + uint64_t paddr:58; > > + uint64_t psize:4; > > + uint64_t swap:1; > > + uint64_t present:1; > > +}; > > This is part of the exported kernel interface and hence should be in a > header somewhere, shouldn't it? The old stuff should have been too. I think we're better off not using bitfields here. > u64 is a bit more conventional than uint64_t, and if we move this to a > userspace-visible header then __u64 is the type to use, I think. Although > one would expect uint64_t to be OK as well. > > > +#ifdef CONFIG_X86 > > +#define PM_PSIZE_1G 3 > > +#define PM_PSIZE_4M 2 > > +#define PM_PSIZE_2M 1 > > +#endif > > No, we should factor this correctly and get the CONFIG_X86 stuff out of here. Perhaps my "continuation bit" idea. > Matt? Help? Did my previous message make it out? This is probably my last message for 24+ hours. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists