lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Jan 2007 09:13:19 -0800 (PST)
From:	Christoph Lameter <clameter@....com>
To:	Paul Jackson <pj@....com>
cc:	akpm@...l.org, menage@...gle.com, linux-kernel@...r.kernel.org,
	nickpiggin@...oo.com.au, linux-mm@...ck.org, ak@...e.de,
	dgc@....com
Subject: Re: [RFC 0/8] Cpuset aware writeback

On Tue, 16 Jan 2007, Paul Jackson wrote:

> > 1. The nodemask expands the inode structure significantly if the
> > architecture allows a high number of nodes. This is only an issue
> > for IA64. 
> 
> Should that logic be disabled if HOTPLUG is configured on?  Or is
> nr_node_ids a valid upper limit on what could be plugged in, even on a
> mythical HOTPLUG system?

nr_node_ids is a valid upper limit on what could be plugged in. We could
modify nodemasks to only use nr_node_ids bits and the kernel would still
be functioning correctly.

> > 2. The calculation of the per cpuset limits can require looping
> > over a number of nodes which may bring the performance of get_dirty_limits
> > near pre 2.6.18 performance
> 
> Could we cache these limits?  Perhaps they only need to be recalculated if
> a tasks mems_allowed changes?

No they change dynamically. In particular writeout reduces the number of 
dirty / unstable pages.

> Separate question - what happens if a tasks mems_allowed changes while it is
> dirtying pages?  We could easily end up with dirty pages on nodes that are
> no longer allowed to the task.  Is there anyway that such a miscalculation
> could cause us to do harmful things?

The dirty_map on an inode is independent of a cpuset. The cpuset only 
comes into effect when we decide to do writeout and are scanning for files 
with pages on the nodes of interest.

> In patch 2/8:
> > The dirty map is cleared when the inode is cleared. There is no
> > synchronization (except for atomic nature of node_set) for the dirty_map. The
> > only problem that could be done is that we do not write out an inode if a
> > node bit is not set.
> 
> Does this mean that a dirty page could be left 'forever' in memory, unwritten,
> exposing us to risk of data corruption on disk, from some write done weeks ago,
> but unwritten, in the event of say a power loss?

No it will age and be written out anyways. Note that there are usually 
multiple dirty pages which reduces the chance of the race. These are node
bits that help to decide when to start writeout of all dirty pages of an 
inode regardless of where the other pages are.

> Also in patch 2/8:
> > +static inline void cpuset_update_dirty_nodes(struct inode *i,
> > +		struct page *page) {}
> 
> Is an incomplete 'struct inode;' declaration needed here in cpuset.h,
> to avoid a warning if compiling with CPUSETS not configured?

Correct.

> 
> In patch 4/8:
> > We now add per node information which I think is equal or less effort
> > since there are less nodes than processors.
> 
> Not so on Paul Menage's fake NUMA nodes - he can have say 64 fake nodes on
> a system with 2 or 4 CPUs and one real node.  But I guess that's ok ...

True but then its fake.

> In patch 4/8:
> > +#ifdef CONFIG_CPUSETS
> > +	/*
> > +	 * Calculate the limits relative to the current cpuset if necessary.
> > +	 */
> > +	if (unlikely(nodes &&
> > +			!nodes_subset(node_online_map, *nodes))) {
> > +		int node;
> > +
> > +		is_subset = 1;
> > +		...
> > +#ifdef CONFIG_HIGHMEM
> > +			high_memory += NODE_DATA(node)
> > +				->node_zones[ZONE_HIGHMEM]->present_pages;
> > +#endif
> > +			nr_mapped += node_page_state(node, NR_FILE_MAPPED) +
> > +					node_page_state(node, NR_ANON_PAGES);
> > +		}
> > +	} else
> > +#endif
> > +	{
> 
> I'm wishing there was a clearer way to write the above code.  Nested
> ifdef's and an ifdef block ending in an open 'else' and perhaps the first
> #ifdef CONFIG_CPUSETS ever, outside of fs/proc/base.c ...

I have tried to replicate the structure for global dirty_limits 
calculation which has the same ifdef.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ