linux-kernel - Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for use by compaction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111122191302.GF8058@quack.suse.cz>
Date:	Tue, 22 Nov 2011 20:13:02 +0100
From:	Jan Kara <jack@...e.cz>
To:	Nai Xia <nai.xia@...il.com>
Cc:	Jan Kara <jack@...e.cz>, Mel Gorman <mgorman@...e.de>,
	Shaohua Li <shaohua.li@...el.com>,
	Linux-MM <linux-mm@...ck.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>,
	Andy Isaacson <adi@...apodia.org>,
	Johannes Weiner <jweiner@...hat.com>,
	Rik van Riel <riel@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 7/7] mm: compaction: Introduce sync-light migration for
 use by compaction

On Tue 22-11-11 21:59:24, Nai Xia wrote:
> On Tuesday 22 November 2011 19:54:27 Jan Kara wrote:
> > On Tue 22-11-11 10:14:51, Mel Gorman wrote:
> > > On Tue, Nov 22, 2011 at 02:56:51PM +0800, Shaohua Li wrote:
> > > > On Tue, 2011-11-22 at 02:36 +0800, Mel Gorman wrote:
> > > > on the other hand, MIGRATE_SYNC_LIGHT now waits for pagelock and buffer
> > > > lock, so could wait on page read. page read and page out have the same
> > > > latency, why takes them different?
> > > > 
> > > 
> > > That's a very reasonable question.
> > > 
> > > To date, the stalls that were reported to be a problem were related to
> > > heavy writing workloads. Workloads are naturally throttled on reads
> > > but not necessarily on writes and the IO scheduler priorities sync
> > > reads over writes which contributes to keeping stalls due to page
> > > reads low.  In my own tests, there have been no significant stalls
> > > due to waiting on page reads. I accept this may be because the stall
> > > threshold I record is too low.
> > > 
> > > Still, I double checked an old USB copy based test to see what the
> > > compaction-related stalls really were.
> > > 
> > > 58 seconds	waiting on PageWriteback
> > > 22 seconds	waiting on generic_make_request calling ->writepage
> > > 
> > > These are total times, each stall was about 2-5 seconds and very rough
> > > estimates. There were no other sources of stalls that had compaction
> > > in the stacktrace I'm rerunning to gather more accurate stall times
> > > and for a workload similar to Andrea's and will see if page reads
> > > crop up as a major source of stalls.
> >   OK, but the fact that reads do not stall may pretty much depend on the
> > behavior of the underlying IO scheduler and we probably don't want to rely
> > on it's behavior too closely. So if you are going to treat reads in a
> > special way, check with NOOP or DEADLINE io schedulers that read-stalls
> > are not a problem with them as well.
> 
> Compared to the IO scheduler, I actually expect this behavior is more related
> to these two facts:
> 
> 1) Due to the IO direction , most pages to be read are still in disk,
> while most pages to be write are in memory. 
> 
> 2) And as Mel explained, read trends to be sync, write trends to be async,
> so for decent IO schedulers, no matter what they differ in each other, 
> should almost agree no favoring read more than write. 
  This is not true. CFQ heavily prefers read IO over write IO. Deadline
scheduler slightly prefers reads and noop io scheduler has no preference.
As a result, page which is read from disk is going to be locked for shorter
time with CFQ scheduler than with NOOP scheduler on average.
 
> So that amounts to the following calculation that is important to the 
> statistical stall time for the compaction:
> 
>      page_nr *  average_stall_window_time
> 
> where average_stall_window_time is the window for a page between 
> NotUptoDate ---> UptoDate or Dirty --> Clean. And page_nr is the
> number of pages in stall window for read or write.
> 
> So for general cases, 
> Fact 1) may ensure that the page_nr is smaller for read, while
> fact 2) may ensure the same for average_locking_window_time.
  Well, page_nr really depends on the load. If the workload is only reads,
clearly number of read pages is going to be higher than number of written
pages. Once workload does heavy writing, I agree number of pages under
writeback is likely going to be higher.
 
> I am not sure this will be the same case for all workloads, 
> don't know if Mel has tested large readahead workloads which 
> has more async read IOs and less writebacks. 
> 
> But theoretically I expect things are not that bad even for large
> readahead, because readahead is triggered by the readahead TAG in
> linear order, which means for a process to generating readahead IO,
> its speed is still somewhat govened by the read IO speed. While
> for a process writing to a file mapped memory area, it may well
> exceed the speed of its backing-store writing speed. 
> 
> 
> Aside from that, I think the relation between page locking and 
> page read is not 1-to-1, in other words, there maybe quite some
> transient page locking is caused by mmap and then page fault into 
> already good-state pages requiring no IO at all. For these 
> transient page lockings I think it's reasonable to have light 
> waiting. 
  Definitely there are other lockings than for read. E.g. to write a page,
we lock it first, submit IO (which can actually block waiting for request
to get freed), set PageWriteback, and unlock the page. And there are more
transient ones like you mention above...

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/