lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130411080243.GA12626@blaptop>
Date:	Thu, 11 Apr 2013 17:02:43 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	KOSAKI Motohiro <kosaki.motohiro@...il.com>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Arun Sharma <asharma@...com>,
	John Stultz <john.stultz@...aro.org>,
	Mel Gorman <mel@....ul.ie>, Hugh Dickins <hughd@...gle.com>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>, Neil Brown <neilb@...e.de>,
	Mike Hommey <mh@...ndium.org>, Taras Glek <tglek@...illa.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Jason Evans <je@...com>, sanjay@...gle.com,
	Paul Turner <pjt@...gle.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Michel Lespinasse <walken@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC v7 00/11] Support vrange for anonymous page

On Thu, Apr 11, 2013 at 03:20:30AM -0400, KOSAKI Motohiro wrote:
> >>>   DONTNEED makes sure user always can see zero-fill pages after
> >>>   he calls madvise while vrange can see data or encounter SIGBUS.
> >>
> >> For replacing DONTNEED, user want to zero-fill pages like DONTNEED
> >> instead of SIGBUS. So, new flag option would be nice.
> > 
> > If userspace people want it, I can do it. 
> > But not sure they want it at the moment becaue vrange is rather
> > different concept of madvise(DONTNEED) POV usage.
> > 
> > As you know well, in case of DONTNEED, user calls madvise _once_ and
> > VM releases memory as soon as he called system call.
> > But vrange is same with delayed free when the system memory pressure
> > happens so user can't know OS frees the pages anytime.
> > It means user should call pair of system call both VRANGE_VOLATILE
> > and VRANGE_NOVOLATILE for right usage of volatile range
> > (for simple, I don't want to tell SIGBUS fault recovery method).
> > If he took a mistake(ie, NOT to call VRANGE_NOVOLATILE) on the range
> > which is used by current process, pages used by some process could be
> > disappeared suddenly.
> > 
> > In summary, I don't think vrange is a replacement of madvise(DONTNEED)
> > but could be useful with madvise(DONTNEED) friend. For example, we can
> > make return 1 in vrange(VRANGE_VOLATILE) if memory pressure was already
> 
> Do you mean vrange(VRANGE_UNVOLATILE)?

I meant VRANGE_VOLATILE. It seems my explanation was poor. Here it goes, again.
Now vrange's semantic return just 0 if the system call is successful, otherwise,
return error. But we can change it as folows

1. return 0 if the system call is successful and memory pressure isn't severe
2. return 1 if the system call is successful and memory pressure is severe
3. return -ERRXXX if the system call is failed by some reason

So the process can know system-wide memory pressure without peeking the vmstat
and then call madvise(DONTNEED) right after vrange call. The benefit is system
can zap all pages instantly.

> btw, assign new error number to asm-generic/errno.h is better than strange '1'.

I can and admit "1" is rather weired.
But it's not error, either.

> 
> 
> > severe so user can catch up memory pressure by return value and calls
> > madvise(DONTNEED) if memory pressure was already severe. Of course, we
> > can handle it vrange system call itself(ex, change vrange system call to
> > madvise(DONTNEED) but don't want it because I want to keep vrange hinting
> > sytem call very light at all times so user can expect latency.
> 
> For allocator usage, vrange(UNVOLATILE) is annoying and don't need at all.
> When data has already been purged, just return new zero filled page. so,
> maybe adding new flag is worthwhile. Because malloc is definitely fast path

I really want it and it's exactly same with madvise(MADV_FREE).
But for implementation, we need page granularity someting in address range
in system call context like zap_pte_range(ex, clear page table bits and
mark something to page flags for reclaimer to detect it).
It means vrange system call is still bigger although we are able to remove
lazy page fault.

Do you have any idea to remove it? If so, I'm very open to implement it.


> and adding new syscall invokation is unwelcome.

Sure. But one more system call could be cheaper than page-granuarity
operation on purged range.

> 
> 
> >> # of     # of   # of
> >> thread   iter   iter (patched glibc)
> > 
> > What's the workload?
> 
> Ahh, sorry. I forgot to write. I use ebizzy, your favolite workload.
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ