lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 30 Aug 2010 15:19:29 +0200 From: Johannes Weiner <hannes@...xchg.org> To: Mel Gorman <mel@....ul.ie> Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>, Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>, Wu Fengguang <fengguang.wu@...el.com>, Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org Subject: Re: [PATCH 2/3] writeback: Record if the congestion was unnecessary On Fri, Aug 27, 2010 at 10:24:16AM +0100, Mel Gorman wrote: > On Fri, Aug 27, 2010 at 10:16:48AM +0200, Johannes Weiner wrote: > > On Thu, Aug 26, 2010 at 09:31:30PM +0100, Mel Gorman wrote: > > > On Thu, Aug 26, 2010 at 08:29:04PM +0200, Johannes Weiner wrote: > > > > On Thu, Aug 26, 2010 at 04:14:15PM +0100, Mel Gorman wrote: > > > > > If congestion_wait() is called when there is no congestion, the caller > > > > > will wait for the full timeout. This can cause unreasonable and > > > > > unnecessary stalls. There are a number of potential modifications that > > > > > could be made to wake sleepers but this patch measures how serious the > > > > > problem is. It keeps count of how many congested BDIs there are. If > > > > > congestion_wait() is called with no BDIs congested, the tracepoint will > > > > > record that the wait was unnecessary. > > > > > > > > I am not convinced that unnecessary is the right word. On a workload > > > > without any IO (i.e. no congestion_wait() necessary, ever), I noticed > > > > the VM regressing both in time and in reclaiming the right pages when > > > > simply removing congestion_wait() from the direct reclaim paths (the > > > > one in __alloc_pages_slowpath and the other one in > > > > do_try_to_free_pages). > > > > > > > > So just being stupid and waiting for the timeout in direct reclaim > > > > while kswapd can make progress seemed to do a better job for that > > > > load. > > > > > > > > I can not exactly pinpoint the reason for that behaviour, it would be > > > > nice if somebody had an idea. > > > > > > > > > > There is a possibility that the behaviour in that case was due to flusher > > > threads doing the writes rather than direct reclaim queueing pages for IO > > > in an inefficient manner. So the stall is stupid but happens to work out > > > well because flusher threads get the chance to do work. > > > > The workload was accessing a large sparse-file through mmap, so there > > wasn't much IO in the first place. > > > > Then waiting on congestion was the totally wrong thing to do. We were > effectively calling sleep(HZ/10) and magically this was helping in some > undefined manner. Do you know *which* called of congestion_wait() was > the most important to you? Removing congestion_wait() in do_try_to_free_pages() definitely worsens reclaim behaviour for this workload: 1. wallclock time of the testrun increases by 11% 2. the scanners do a worse job and go for the wrong zone: -pgalloc_dma 79597 -pgalloc_dma32 134465902 +pgalloc_dma 297089 +pgalloc_dma32 134247237 -pgsteal_dma 77501 -pgsteal_dma32 133939446 +pgsteal_dma 294998 +pgsteal_dma32 133722312 -pgscan_kswapd_dma 145897 -pgscan_kswapd_dma32 266141381 +pgscan_kswapd_dma 287981 +pgscan_kswapd_dma32 186647637 -pgscan_direct_dma 9666 -pgscan_direct_dma32 1758655 +pgscan_direct_dma 302495 +pgscan_direct_dma32 80947179 -pageoutrun 1768531 -allocstall 614 +pageoutrun 1927451 +allocstall 8566 I attached the full vmstat contents below. Also the test program, which I ran in this case as: ./mapped-file-stream 1 $((512 << 30)) > > > > So personally I think it's a good idea to get an insight on the use of > > > > congestion_wait() [patch 1] but I don't agree with changing its > > > > behaviour just yet, or judging its usefulness solely on whether it > > > > correctly waits for bdi congestion. > > > > > > > > > > Unfortunately, I strongly suspect that some of the desktop stalls seen during > > > IO (one of which involved no writes) were due to calling congestion_wait > > > and waiting the full timeout where no writes are going on. > > > > Oh, I am in full agreement here! Removing those congestion_wait() as > > described above showed a reduction in peak latency. The dilemma is > > only that it increased the overall walltime of the load. > > > > Do you know why because leaving in random sleeps() hardly seems to be > the right approach? I am still trying to find out what's going wrong. > > And the scanning behaviour deteriorated, as in having increased > > scanning pressure on other zones than the unpatched kernel did. > > > > Probably because it was scanning more but not finding what it needed. > There is a condition other than congestion it is having trouble with. In > some respects, I think if we change congestion_wait() as I propose, > we may see a case where CPU usage is higher because it's now > encountering the unspecified reclaim problem we have. Exactly. > > So I think very much that we need a fix. congestion_wait() causes > > stalls and relying on random sleeps for the current reclaim behaviour > > can not be the solution, at all. > > > > I just don't think we can remove it based on the argument that it > > doesn't do what it is supposed to do, when it does other things right > > that it is not supposed to do ;-) > > > > We are not removing it, we are just stopping it going to sleep for > stupid reasons. If we find that wall time is increasing as a result, we > have a path to figuring out what the real underlying problem is instead > of sweeping it under the rug. Well, for that testcase it is in effect the same as a removal as there's never congestion. But again: I agree with your changes per-se, I just don't think they should get merged as long as they knowingly catalyze a problem that has yet to be identified. View attachment "mapped-file-stream.c" of type "text/plain" (1885 bytes) Download attachment "vmstat.a.2" of type "application/x-troff-man" (1709 bytes) Download attachment "vmstat.b.2" of type "application/x-troff-man" (1719 bytes)
Powered by blists - more mailing lists