linux-kernel - Re: upcoming kerneloops.org item: get_page_from

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0906251257040.3086@chino.kir.corp.google.com>
Date:	Thu, 25 Jun 2009 13:18:59 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Theodore Tso <tytso@....edu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	penberg@...helsinki.fi, arjan@...radead.org,
	linux-kernel@...r.kernel.org, cl@...ux-foundation.org,
	npiggin@...e.de
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist

On Thu, 25 Jun 2009, Theodore Tso wrote:

> On Thu, Jun 25, 2009 at 03:38:06PM -0400, Theodore Tso wrote:
> > Hmm, is there a reason to avoid using GFP_ATOMIC on the first
> > allocation, and only adding GFP_ATOMIC after the first failure?
> 
> Never mind, stupid question; I hit the send button before thinking
> about this enough.  Obviously we should try without GFP_ATOMIC so the
> allocator can try to release some memory.

The allocator can't actually release much memory itself, it must rely on 
pdflush to do writeback and the slab shrinkers are mostly all no-ops for 
~__GFP_FS.  The success of pdflush's freeing will depend on the caller's 
context.

> So maybe the answer for
> filesystem code where the alternative to allocator failure is
> remounting the root filesystem read-only or panic(), should be:
> 
> 1)  Try to do the allocation GFP_NOFS.
> 
> 2)  Then try GFP_ATOMIC
> 
> 3) Then retry the allocator with GFP_NOFS in a loop (possibly with a
> timeout than then panic's the system and allows the system to reboot,
> although arguably a watchdot timer should really perform that
> function).
> 

This is similar to how __getblk() will repeatedly loop until it gets 
sufficient memory to create buffers for the block page, which also relies 
heavily on pdflush.  If the GFP_ATOMIC allocation failed, then it's 
unlikely that the subsequent GFP_NOFS allocation will succeed any time 
soon without the oom killer, which we're not allowed to call, so it would 
probably be better to loop in step #2 with congestion_wait().

> Obviously if we can rework the filesystem code to avoid this as much
> as possible, this would be desirable, but if there are some cases left
> over where we really have no choice, that's probably what we should
> do.
> 

Isn't there also a problem in jbd2_journal_write_metadata_buffer(), 
though?

		char *tmp;

		jbd_unlock_bh_state(bh_in);
		tmp = jbd2_alloc(bh_in->b_size, GFP_NOFS);
		jbd_lock_bh_state(bh_in);
		if (jh_in->b_frozen_data) {
			jbd2_free(tmp, bh_in->b_size);
			goto repeat;
		}

		jh_in->b_frozen_data = tmp;
		mapped_data = kmap_atomic(new_page, KM_USER0);
		memcpy(tmp, mapped_data + new_offset, jh2bh(jh_in)->b_size);

jbd2_alloc() is just a wrapper to __get_free_pages() and if it fails, it 
appears as though the memcpy() would cause a NULL pointer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/