lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <866658.37093.qm@web32510.mail.mud.yahoo.com>
Date:	Wed, 13 Feb 2008 15:43:17 -0800 (PST)
From:	Kanoj Sarcar <kanojsarcar@...oo.com>
To:	Christoph Lameter <clameter@....com>
Cc:	Christian Bell <christian.bell@...gic.com>,
	Jason Gunthorpe <jgunthorpe@...idianresearch.com>,
	Rik van Riel <riel@...hat.com>,
	Andrea Arcangeli <andrea@...ranet.com>, a.p.zijlstra@...llo.nl,
	izike@...ranet.com, Roland Dreier <rdreier@...co.com>,
	steiner@....com, linux-kernel@...r.kernel.org, avi@...ranet.com,
	linux-mm@...ck.org, daniel.blueman@...drics.com,
	Robin Holt <holt@....com>, general@...ts.openfabrics.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	kvm-devel@...ts.sourceforge.net
Subject: Re: [ofa-general] Re: Demand paging for memory regions


--- Christoph Lameter <clameter@....com> wrote:

> On Wed, 13 Feb 2008, Kanoj Sarcar wrote:
> 
> > It seems that the need is to solve potential
> memory
> > shortage and overcommit issues by being able to
> > reclaim pages pinned by rdma driver/hardware. Is
> my
> > understanding correct?
> 
> Correct.
> 
> > If I do understand correctly, then why is rdma
> page
> > pinning any different than eg mlock pinning? I
> imagine
> > Oracle pins lots of memory (using mlock), how come
> > they do not run into vm overcommit issues?
> 
> Mlocked pages are not pinned. They are movable by
> f.e. page migration and 
> will be potentially be moved by future memory defrag
> approaches. Currently 
> we have the same issues with mlocked pages as with
> pinned pages. There is 
> work in progress to put mlocked pages onto a
> different lru so that reclaim 
> exempts these pages and more work on limiting the
> percentage of memory 
> that can be mlocked.
> 
> > Are we up against some kind of breaking c-o-w
> issue
> > here that is different between mlock and rdma
> pinning?
> 
> Not that I know.
> 
> > Asked another way, why should effort be spent on a
> > notifier scheme, and rather not on fixing any
> memory
> > accounting problems and unifying how pin pages are
> > accounted for that get pinned via mlock() or rdma
> > drivers?
> 
> There are efforts underway to account for and limit
> mlocked pages as 
> described above. Page pinning the way it is done by
> Infiniband through
> increasing the page refcount is treated by the VM as
> a temporary 
> condition not as a permanent pin. The VM will
> continually try to reclaim 
> these pages thinking that the temporary usage of the
> page must cease 
> soon. This is why the use of large amounts of pinned
> pages can lead to 
> livelock situations.

Oh ok, yes, I did see the discussion on this; sorry I
missed it. I do see what notifiers bring to the table
now (without endorsing it :-)).

An orthogonal question is this: is IB/rdma the only
"culprit" that elevates page refcounts? Are there no
other subsystems which do a similar thing?

The example I am thinking about is rawio (Oracle's
mlock'ed SHM regions are handed to rawio, isn't it?).
My understanding of how rawio works in Linux is quite
dated though ...

Kanoj

> 
> If we want to have pinning behavior then we could
> mark pinned pages 
> specially so that the VM will not continually try to
> evict these pages. We 
> could manage them similar to mlocked pages but just
> not allow page 
> migration, memory unplug and defrag to occur on
> pinned memory. All of 
> theses would have to fail. With the notifier scheme
> the device driver 
> could be told to get rid of the pinned memory. This
> would make these 3 
> techniques work despite having an RDMA memory
> section.
> 
> > Startup benefits are well understood with the
> notifier
> > scheme (ie, not all pages need to be faulted in at
> > memory region creation time), specially when most
> of
> > the memory region is not accessed at all. I would
> > imagine most of HPC does not work this way though.
> 
> No for optimal performance  you would want to
> prefault all pages like 
> it is now. The notifier scheme would only become
> relevant in memory 
> shortage situations.
> 
> > Then again, as rdma hardware is applied
> (increasingly?) towards apps 
> > with short lived connections, the notifier scheme
> will help with startup 
> > times.
> 
> The main use of the notifier scheme is for stability
> and reliability. The 
> "pinned" pages become unpinnable on request by the
> VM. So the VM can work 
> itself out of memory shortage situations in
> cooperation with the 
> RDMA logic instead of simply failing.
> 
> --
> To unsubscribe, send a message with 'unsubscribe
> linux-mm' in
> the body to majordomo@...ck.org.  For more info on
> Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org">
> email@...ck.org </a>
> 



      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ