linux-kernel - RE: swap on eMMC and other flash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <26E7A31274623843B0E8CF86148BFE326FB66E94@NTXAVZMBX04.azit.micron.com>
Date:	Fri, 27 Apr 2012 07:34:09 +0000
From:	"Luca Porzio (lporzio)" <lporzio@...ron.com>
To:	Stephan Uphoff <ups@...gle.com>, Arnd Bergmann <arnd@...db.de>
CC:	Minchan Kim <minchan@...nel.org>,
	"linaro-kernel@...ts.linaro.org" <linaro-kernel@...ts.linaro.org>,
	"android-kernel@...glegroups.com" <android-kernel@...glegroups.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Alex Lemberg <alex.lemberg@...disk.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Saugata Das <saugata.das@...aro.org>,
	Venkatraman S <venkat@...aro.org>,
	Yejin Moon <yejin.moon@...sung.com>,
	Hyojin Jeong <syr.jeong@...sung.com>,
	"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>
Subject: RE: swap on eMMC and other flash

Stephan,

Good ideas. Some comments of mine below.

> -----Original Message-----
> From: linux-mmc-owner@...r.kernel.org [mailto:linux-mmc-owner@...r.kernel.org]
> On Behalf Of Stephan Uphoff
> Sent: Tuesday, April 17, 2012 3:22 AM
> To: Arnd Bergmann
> Cc: Minchan Kim; linaro-kernel@...ts.linaro.org; android-
> kernel@...glegroups.com; linux-mm@...ck.org; Luca Porzio (lporzio); Alex
> Lemberg; linux-kernel@...r.kernel.org; Saugata Das; Venkatraman S; Yejin Moon;
> Hyojin Jeong; linux-mmc@...r.kernel.org
> Subject: Re: swap on eMMC and other flash
> 
> I really like where this is going and would like to use the
> opportunity to plant a few ideas.
> 
> In contrast to rotational disks read/write operation overhead and
> costs are not symmetric.
> While random reads are much faster on flash - the number of write
> operations is limited by wearout and garbage collection overhead.
> To further improve swapping on eMMC or similar flash media I believe
> that the following issues need to be addressed:
> 
> 1) Limit average write bandwidth to eMMC to a configurable level to
> guarantee a minimum device lifetime
> 2) Aim for a low write amplification factor to maximize useable write
> bandwidth
> 3) Strongly favor read over write operations
> 
> Lowering write amplification (2) has been discussed in this email
> thread - and the only observation I would like to add is that
> over-provisioning the internal swap space compared to the exported
> swap space significantly can guarantee a lower write amplification
> factor with the indirection and GC techniques discussed.
> 
> I believe the swap functionality is currently optimized for storage
> media where read and write costs are nearly identical.
> As this is not the case on flash I propose splitting the anonymous
> inactive queue (at least conceptually) - keeping clean anonymous pages
> with swap slots on a separate queue as the cost of swapping them
> out/in is only an inexpensive read operation. A variable similar to
> swapiness (or a more dynamic algorithmn) could determine the
> preference for swapping out clean pages or dirty pages. ( A similar
> argument could be made for splitting up the file inactive queue )
> 

I totally agree. Read are inexpensive on flash based devices and as such a good swap algorithm (as well as a flash oriented FS) should take this into account.

> The problem of limiting the average write bandwidth reminds me of
> enforcing cpu utilization limits on interactive workloads.
> Just as with cpu workloads - using the resources to the limit produces
> poor interactivity.

I don't quite get your definition of interactive workload and I am not sure here which is the technique for limiting resource utilization you have in mind.
CGroups, for example, have proven not to be much reliable through time. 
Also in my experience it has always been very difficult to correlate resources utilization stats with user interactivity.
The only technique which has been proven reliable through time is to do something while the system is idle, which is what, to my understanding, is already done.

> When interactivity suffers too much I believe the only sane response
> for an interactive device is to limit usage of the swap device and
> transition into a low memory situation - and if needed - either
> allowing userspace to reduce memory usage or invoking the OOM killer.
> As a result low memory situations could not only be encountered on new
> memory allocations but also on workload changes that increase the
> number of dirty pages.
> 

I agree with your comments about the OOM killer (what is the point of swapping out a page if that process is going to be killed soon? That is only increasing the WAF factor on MMCs). In fact one proposal here could be to somewhat mix OOM index with page age.
I would suggest to first optimize swap traffic for an MMC device and then start thinking about this.

> A wild idea to avoid some writes altogether is to see if
> de-duplication techniques can be used to (partially?) match pages
> previously written so swap.

If you have such a situation, I think this is where KSM may help. It is my personal belief that with a bit of work, the KSM algorithm can be extended to swapped out pages too with little effort (at the expense of few increase of read traffic, which is ok for flash based storage devices). 

> In case of unencrypted swap  (or encrypted swap with a static key)
> swap pages on eMMC could even be re-used across multiple reboots.
> A simple version would just compare dirty pages with data in their
> swap slots as I suspect (but really don't know) that some user space
> algorithms (garbage collection?) dirty a page just temporarily -
> eventually reverting it to the previous content.
> 

This goes in contrast with discarding or trimming a page and as such the advantages of this technique needs to be proven vs the performance gain of using the discard command.

> Stephan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers,
    Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/