lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E1MfuU8-00088Q-Ta@pomaz-ex.szeredi.hu>
Date:	Tue, 25 Aug 2009 13:46:44 +0200
From:	Miklos Szeredi <miklos@...redi.hu>
To:	a.p.zijlstra@...llo.nl
CC:	richard@....demon.co.uk, akpm@...ux-foundation.org,
	chris.mason@...cle.com, linux-kernel@...r.kernel.org,
	jens.axboe@...cle.com, miklos@...redi.hu
Subject: Re: [RFC PATCH] mm: balance_dirty_pages. reduce calls to
 global_page_state to reduce cache references

On Fri, 21 Aug 2009, Peter Zijlstra wrote:
> > I have tried to retain the existing behavior as much as possible, but
> > have added NR_WRITEBACK_TEMP to nr_writeback. This counter was used in
> > clip_bdi_dirty_limit but not in balance_dirty_pages, grep suggests this
> > is only used by FUSE but I haven't done any testing on that. It does
> > seem logical to count all the WRITEBACK pages when making the throttling
> > decisions so this change should be more correct ;)
> 
> Right, the NR_WRITEBACK_TEMP thing is a FUSE feature, its used in
> writable mmap() support for FUSE things.
> 
> I must admit to forgetting the exact semantics of the things, maybe
> Miklos can remind us.

I'll try: fuse is special because it needs writeback to be "very
asynchronous".  What I mean by this is that writing a dirty page may
block indefinitely and that shouldn't hold up unrelated filesystem or
memory operations.

To satisfy this, fuse copies contents of dirty pages over to
"temporary pages" and queues the write request with this temporary
page, not the original page-cache page.

This has two effects:

 - the page-cache page does not remain in "writeback" state but is
   cleaned immediately

 - the NR_WRITEBACK counter is not incremented for the duration of the
   writeback

The first one is important because vmscan and page migration do
wait_on_page_writeback() in some circumstances, which would block on
fuse writebacks.

The second one is important because vmscan will throttle writeout if
the NR_WRITEBACK counter goes over the dirty threshold
(throttle_vm_writeout).  There were long discussions about this one,
but in the end no one could surely tell how this works and why it is
important.  But NR_WRITEBACK_TEMP must not be counted there, otherwise
the page scanning can deadlock with fuse filesystems.

About balance_dirty_pages() I'm not quite sure.  By a similar logic we
don't want NR_WRITEBACK_TEMP pages to contribute to throttling
unrelated filesytem writebacks.  That might (through recursion via a
userspace filesystem) lead to a deadlock.

So my recommendation is that we should retain the old behavior.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ