linux-kernel - Re: [00/17] [RFC] Virtual Compound Page Support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20070919103400.7bc8ffd9.dada1@cosmosbay.com>
Date:	Wed, 19 Sep 2007 10:34:00 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Anton Altaparmakov <aia21@....ac.uk>
Cc:	Christoph Lameter <clameter@....com>,
	Christoph Hellwig <hch@....de>, Mel Gorman <mel@...net.ie>,
	David Chinner <dgc@....com>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [00/17] [RFC] Virtual Compound Page Support

On Wed, 19 Sep 2007 08:34:47 +0100
Anton Altaparmakov <aia21@....ac.uk> wrote:

> Hi Christoph,
> 
> On 19 Sep 2007, at 04:36, Christoph Lameter wrote:
> > Currently there is a strong tendency to avoid larger page  
> > allocations in
> > the kernel because of past fragmentation issues and the current
> > defragmentation methods are still evolving. It is not clear to what  
> > extend
> > they can provide reliable allocations for higher order pages (plus the
> > definition of "reliable" seems to be in the eye of the beholder).
> >
> > Currently we use vmalloc allocations in many locations to provide a  
> > safe
> > way to allocate larger arrays. That is due to the danger of higher  
> > order
> > allocations failing. Virtual Compound pages allow the use of regular
> > page allocator allocations that will fall back only if there is an  
> > actual
> > problem with acquiring a higher order page.
> >
> > This patch set provides a way for a higher page allocation to fall  
> > back.
> > Instead of a physically contiguous page a virtually contiguous page
> > is provided. The functionality of the vmalloc layer is used to provide
> > the necessary page tables and control structures to establish a  
> > virtually
> > contiguous area.
> 
> I like this a lot.  It will get rid of all the silly games we have to  
> play when needing both large allocations and efficient allocations  
> where possible.  In NTFS I can then just allocated higher order pages  
> instead of having to mess about with the allocation size and  
> allocating a single page if the requested size is <= PAGE_SIZE or  
> using vmalloc() if the size is bigger.  And it will make it faster  
> because a lot of the time a higher order page allocation will succeed  
> with your patchset without resorting to vmalloc() so that will be a  
> lot faster.
> 
> So where I currently have fs/ntfs/malloc.h the below mess I could get  
> rid of it completely and just use the normal page allocator/ 
> deallocator instead...
> 
> static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
> {
>          if (likely(size <= PAGE_SIZE)) {
>                  BUG_ON(!size);
>                  /* kmalloc() has per-CPU caches so is faster for  
> now. */
>                  return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
>                  /* return (void *)__get_free_page(gfp_mask); */
>          }
>          if (likely(size >> PAGE_SHIFT < num_physpages))
>                  return __vmalloc(size, gfp_mask, PAGE_KERNEL);
>          return NULL;
> }
> 
> And other places in the kernel can make use of the same.  I think XFS  
> does very similar things to NTFS in terms of larger allocations at  
> least and there are probably more places I don't know about off the  
> top of my head...
> 
> I am looking forward to your patchset going into mainline.  (-:

Sure, it sounds *really* good. But...

1) Only power of two allocations are good candidates, or we waste RAM

2) On i386 machines, we have a small vmalloc window. (128 MB default value)
  Many servers with >4GB memory (PAE) like to boot with vmalloc=32M option to get 992MB of LOWMEM.
  If we allow some slub caches to fallback to vmalloc land, we'll have problems to tune this.

3) A fallback to vmalloc means an allocation of one vm_struct per compound page.

4) vmalloc() currently uses a linked list of vm_struct. Might need something more scalable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/