lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 15 Jan 2016 11:41:50 -0800
From:	"Darrick J. Wong" <darrick.wong@...cle.com>
To:	Matthew Wilcox <matthew.r.wilcox@...el.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Matthew Wilcox <willy@...ux.intel.com>, linux-mm@...ck.org,
	linux-nvdimm@...1.01.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH v3 0/8] Support for transparent PUD pages for DAX files

On Fri, Jan 08, 2016 at 02:49:44PM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@...ux.intel.com>
> 
> Andrew, I think this is ready for a spin in -mm.
> 
> v3: Rebased against current mmtom
> v2: Reduced churn in filesystems by switching to ->huge_fault interface
>     Addressed concerns from Kirill
> 
> We have customer demand to use 1GB pages to map DAX files.  Unlike the 2MB
> page support, the Linux MM does not currently support PUD pages, so I have
> attempted to add support for the necessary pieces for DAX huge PUD pages.
> 
> Filesystems still need work to allocate 1GB pages.  With ext4, I can
> only get 16MB of contiguous space, although it is aligned.  With XFS,
> I can get 80MB less than 1GB, and it's not aligned.  The XFS problem
> may be due to the small amount of RAM in my test machine.

"It's not aligned"... I don't know the details of what you're trying to do, but
are you trying to create a file where each GB of logical address space maps to
a contiguous GB of physical space, and both logical and physical offsets align
to a 1GB boundary?

If the XFS is formatted with stripe unit/width of 1G, an extent size hint of 1G
is put on the file, and the whole file is allocated in 1G chunks, I think
you're supposed to be able to make the above happen:

# mkfs.xfs /dev/mapper/moo -f -d su=1g,sw=1
meta-data=/dev/mapper/moo        isize=512    agcount=34, agsize=8126464 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=268435456, imaxpct=5
         =                       sunit=262144 swidth=262144 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=131072, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mount /dev/mapper/moo /mnt
# xfs_io -f -c 'extsize 1g' -c 'falloc 0 200g' /mnt/urk
# filefrag -v /mnt/urk
Filesystem type is: 58465342
File size of /mnt/urk is 214748364800 (52428800 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0.. 7340031:     524288..   7864319: 7340032:             unwritten
   1:  7340032..14680063:    8388608..  15728639: 7340032:    7864320: unwritten
   2: 14680064..22020095:   16515072..  23855103: 7340032:   15728640: unwritten
   3: 22020096..29360127:   24641536..  31981567: 7340032:   23855104: unwritten
   4: 29360128..36700159:   32768000..  40108031: 7340032:   31981568: unwritten
   5: 36700160..40370175:   40894464..  44564479: 3670016:   40108032: unwritten
   6: 40370176..44040191:   44826624..  48496639: 3670016:   44564480: unwritten
   7: 44040192..51380223:   49020928..  56360959: 7340032:   48496640: unwritten
   8: 51380224..52428799:   57147392..  58195967: 1048576:   56360960: last,unwritten,eof
/mnt/urk: 9 extents found

AFAICT each extent's logical and physical offsets are aligned to a 1G boundary.

<shrug> Just a shot in the dark.

(This VM has 2G of memory and 1T of fake disk.)

--D

> 
> This patch set is against something approximately current -mm.  I'd like
> to thank Dave Chinner & Kirill Shutemov for their reviews of v1.
> The conversion of pmd_fault & pud_fault to huge_fault is thanks to
> Dave's poking, and Kirill spotted a couple of problems in the MM code.
> Version 2 of the patch set is about 200 lines smaller (1016 insertions,
> 23 deletions in v1).
> 
> I've done some light testing using a program to mmap a block device
> with DAX enabled, calling mincore() and examining /proc/smaps and
> /proc/pagemap.
> 
> Matthew Wilcox (8):
>   mm: Convert an open-coded VM_BUG_ON_VMA
>   mm,fs,dax: Change ->pmd_fault to ->huge_fault
>   mm: Add support for PUD-sized transparent hugepages
>   mincore: Add support for PUDs
>   procfs: Add support for PUDs to smaps, clear_refs and pagemap
>   x86: Add support for PUD-sized transparent hugepages
>   dax: Support for transparent PUD pages
>   ext4: Support for PUD-sized transparent huge pages
> 
>  Documentation/filesystems/dax.txt     |  12 +-
>  arch/Kconfig                          |   3 +
>  arch/x86/Kconfig                      |   1 +
>  arch/x86/include/asm/paravirt.h       |  11 ++
>  arch/x86/include/asm/paravirt_types.h |   2 +
>  arch/x86/include/asm/pgtable.h        |  94 ++++++++++++
>  arch/x86/include/asm/pgtable_64.h     |  13 ++
>  arch/x86/kernel/paravirt.c            |   1 +
>  arch/x86/mm/pgtable.c                 |  31 ++++
>  fs/block_dev.c                        |  10 +-
>  fs/dax.c                              | 272 +++++++++++++++++++++++++---------
>  fs/ext2/file.c                        |  27 +---
>  fs/ext4/file.c                        |  60 +++-----
>  fs/proc/task_mmu.c                    | 109 ++++++++++++++
>  fs/xfs/xfs_file.c                     |  25 ++--
>  fs/xfs/xfs_trace.h                    |   2 +-
>  include/asm-generic/pgtable.h         |  62 +++++++-
>  include/asm-generic/tlb.h             |  14 ++
>  include/linux/dax.h                   |  17 ---
>  include/linux/huge_mm.h               |  50 +++++++
>  include/linux/mm.h                    |  43 +++++-
>  include/linux/mmu_notifier.h          |  13 ++
>  include/linux/pfn_t.h                 |   8 +
>  mm/huge_memory.c                      | 151 +++++++++++++++++++
>  mm/memory.c                           | 101 +++++++++++--
>  mm/mincore.c                          |  13 ++
>  mm/pagewalk.c                         |  19 ++-
>  mm/pgtable-generic.c                  |  14 ++
>  28 files changed, 980 insertions(+), 198 deletions(-)
> 
> -- 
> 2.6.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ