lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18388.50188.552322.780524@notabene.brown>
Date:	Mon, 10 Mar 2008 16:15:56 +1100
From:	Neil Brown <neilb@...e.de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	netdev@...r.kernel.org, trond.myklebust@....uio.no,
	Pekka Enberg <penberg@...helsinki.fi>
Subject: Re: [PATCH 00/28] Swap over NFS -v16

On Friday March 7, a.p.zijlstra@...llo.nl wrote:
> Hi Neil,
> 
> I'm so glad you are working with me on this and writing this in human
> English. It seems to be my eternal short-comming to communicate my ideas
> clearly :-/. Thanks for your effort!

:-)
It always helps to have a second brain with a different perspective.


> 
> On Fri, 2008-03-07 at 14:33 +1100, Neil Brown wrote:
> > 
> > [I don't find the above wholly satisfying.  There seems to be too much
> >  hand-waving.  If someone can provide better text explaining why
> >  swapout is a special case, that would be great.]
> 
> Anonymous pages are dirty by definition (except the zero page, but I
> think we recently ditched it). So shrinking of the anonymous pool will
> require swapping.

Well, there is the swap cache.  That's probably what I was thinking of
when I said "clean anonymous pages".  I suspect they are the first to
go!

> 
> It is indeed the last refuge for those with GFP_NOFS. Allong with the
> strict limit on the amount of dirty file pages it also ensures writing
> those out will never deadlock the machine as there are always clean file
> pages and or anonymous pages to launder.

The difficulty I have is justifying exactly why page-cache writeout
will not deadlock.  What if all the memory that is not dirty-pagecache
is anonymous, and if swap isn't enabled?
Maybe the number returned by "determine_dirtyable_memory" in
page-writeback.c excludes anonymous pages?  I wonder if the meaning of
NR_FREE_PAGES, NR_INACTIVE, etc is documented anywhere....

...
> 
> Right. I've had a long conversation on PG_emergency with Pekka. And I
> think the conclusion was that PG_emergency will create more head-aches
> than it solves. I probably have the conversation in my IRC logs and
> could email it if you're interested (and Pekka doesn't object).

Maybe that depends on the exact semantic of PG_emergency ??
I remember you being concerned that PG_emergency never changes between
allocation and freeing, and that wouldn't work well with slub.
My envisioned semantic has it possibly changing quite often.
What it means is:
   The last allocation done from this page was in a low-memory
   condition.

You really need some way to tell if the result of kmalloc/kmemalloc
should be treated as reserved.
I think you had code which first tried the allocation without
GFP_MEMALLOC and then if that failed, tried again *with*
GFP_MEMALLOC.  If that then succeeded, it is assumed to be an
allocation from reserves.  That seemed rather ugly, though I guess you
could wrap it in a function to hide the ugliness:

void *kmalloc_reserve(size_t size, int *reserve, gfp_t gfp_flags)
{
	void *result = kmalloc(size, gfp_flags & ~GFP_MEMALLOC);
	if (result) {
		*reserve = 0;
		return result;
	}
	result = kmalloc(size, gfp_flags | GFP_MEMALLOC);
	if (result) {
		*reserve = 1;
		return result;
	}
	return NULL;
}
???

> 
> I've already heard interest from other people to use these hooks to
> provide swap on other non-block filesystems such as jffs2, logfs and the
> like.

I'm interested in the swap_in/swap_out interface for external
write-intent bitmaps for md/raid arrays.
You can have a write-intent bitmap which records which blocks might be
dirty if the host crashes, so that resync is much faster.
It can be stored in a file in a separate filesystem, but that is
currently implemented by using bmap to enumerate the blocks and then
reading/writing directly to the device (like swap).  Your interface
would be much nicer for that (not that I think having a
write-intent-bitmap on an NFS filesystem would be a clever idea ;-)

I'll look forward to your next patch set....

One thing I had thought odd while reading the patches, but haven't
found an opportunity to mention before, is the "IS_SWAPFILE" test in
nfs-swapper.patch.
This seems like a layering violation.  It would be better if the test
was based on whether  ->swapfile had been called on the file.  That way
my write-intent-bitmaps would get the same benefit.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ