lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1402061537180.3441@chino.kir.corp.google.com>
Date:	Thu, 6 Feb 2014 15:48:22 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
cc:	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	Fengguang Wu <fengguang.wu@...el.com>,
	David Cohen <david.a.cohen@...ux.intel.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	Damien Ramonda <damien.ramonda@...el.com>,
	Jan Kara <jack@...e.cz>, Linus <torvalds@...ux-foundation.org>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH V5] mm readahead: Fix readahead fail for no local
 memory and limit readahead pages

On Thu, 6 Feb 2014, Andrew Morton wrote:

> On Thu, 6 Feb 2014 14:58:21 -0800 (PST) David Rientjes <rientjes@...gle.com> wrote:
> 
> > > > +#define MAX_REMOTE_READAHEAD   4096UL
> > > >  /*
> > > >   * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
> > > >   * sensible upper limit.
> > > >   */
> > > >  unsigned long max_sane_readahead(unsigned long nr)
> > > >  {
> > > > -	return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
> > > > -		+ node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
> > > > +	unsigned long local_free_page;
> > > > +	int nid;
> > > > +
> > > > +	nid = numa_node_id();
> > 
> > If you're intending this to be cached for your calls into 
> > node_page_state() you need nid = ACCESS_ONCE(numa_node_id()).
> 
> ugh.  That's too subtle and we didn't even document it.
> 
> We could put the ACCESS_ONCE inside numa_node_id() I assume but we
> still have the same problem as smp_processor_id(): the numa_node_id()
> return value is wrong as soon as you obtain it if running preemptibly. 
> 
> We could plaster Big Fat Warnings all over the place or we could treat
> numa_node_id() and derivatives in the same way as smp_processor_id()
> (which is a huge pain).  Or something else, but we've left a big hand
> grenade here and Raghavendra won't be the last one to pull the pin?
> 

Normally it wouldn't matter because there's no significant downside to it 
racing, things like mempolicies which use numa_node_id() extensively would 
result in, oops, a page allocation on the wrong node.

This stands out to me, though, because you're expecting the calculation to 
be correct for a specific node.

The patch is still wrong, though, it should just do

	int node = ACCESS_ONCE(numa_mem_id());
	return min(nr, (node_page_state(node, NR_INACTIVE_FILE) +
		        node_page_state(node, NR_FREE_PAGES)) / 2);

since we want to readahead based on the cpu's local node, the comment 
saying we're reading ahead onto "remote memory" is wrong since a 
memoryless node has local affinity to numa_mem_id().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ