lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <alpine.DEB.0.9999.0710282130320.32474@chino.kir.corp.google.com> Date: Sun, 28 Oct 2007 21:47:58 -0700 (PDT) From: David Rientjes <rientjes@...gle.com> To: Paul Jackson <pj@....com> cc: clameter@....com, Lee.Schermerhorn@...com, akpm@...ux-foundation.org, ak@...e.de, linux-kernel@...r.kernel.org Subject: Re: [patch 2/2] cpusets: add interleave_over_allowed option On Sun, 28 Oct 2007, Paul Jackson wrote: > And, unless someone in the know tells us otherwise, I have to assume > that this could break them. Now, the odds are that they simply don't > run that solution stack on any system making active use of cpusets, > so the odds are this would be no problem for them. But I don't > presently have enough knowledge of their situation to take that risk. > If we can't identify any applications that would be broken by this, what's the difference in simply implementing Choice B and then, if we hear complaints, add your hack to revert back to Choice A behavior based on the get_mempolicy() call you specified is always part of libnuma? The problem that I see with immediately offering both choices is that we don't know if anybody is actually reverting back to Choice A behavior because libnuma, by default, would use it. That's going to making it very painful to remove later because we've supported both options and have made libnuma and {get,set}_mempolicy() arguments ambiguous. We should only support both choices if they will both be used and there's no hard evidence to suggest that at this point. > But dual support is pretty easy so far as the kernel code is concerned. > It's just a few nodes_remap() calls optionally invoked at a few key > spots in mm/mempolicy.c. Consequently there won't be a big hurry to > remove Choice A. > You earlier insisted on an ease of documentation for the MPOL_INTERLEAVE case and now this dual support that you're proposing is going to make the documentation very difficult to understand for anyone who simply wants to use mempolicies. Others even in this thread have had a hard enough time understanding the difference between the two choices and you explained them very thoroughly. It's going to be much more trouble than it's worth, I predict. > There is no "_then_ attach the task to a cpuset." On systems with > kernels configured with CONFIG_CPUSETS=y, all tasks are in a cpuset > all the time. Moreover, from a practical point of view, on large > systems managed with cpuset based mechanisms, almost all tasks are in > cpusets that do not include all nodes, for the entire life of the task. > And that application would need to be implemented to know the nodes that it has access to before it issues its set_mempolicy(MPOL_PREFERRED) command anyway if it truly uses Choice A behavior. So unless these tasks are looking in /proc/pid/status and parsing Mems_allowed and then specifying one as its preferred node or always being guaranteed a certain set of nodes that they are always attached to in a cpuset so they have such foresight of what node to prefer, Choice A can't possibly be what they want. > > Yet the 'mems' file would still be system-wide; otherwise it would be > > impossible to expand the memory your cpuset has access to. > > I had to read that a couple of times to make sense of it. I take that > it means that the node numbering used in each cpuset's 'mems' file has > to be system-wide. Yes, agreed. > > (Well, actually, the node numbering of each cpusets 'mems' file could > be relative to its parent cpusets 'mem' numbers, but let's not go > there, as this discussion is already sufficiently complicated ;) > I appreciate that very much. > Would it meet the need that prompted your initial patch set if we > added Choice B memory policy node numbering, but left Choice A as the > kernel default, with a per-task option (perhaps invokable by a new > option to one of the {get,set}_mempolicy() calls) to choose Choice B? > The needs I was addressing with my initial patchset was so that when a cpuset is expanded, any MPOL_INTERLEAVE memory policy of attached tasks automatically get expanded as well. This discussion has somewhat diverged from that, but I hope you still support what we earlier talked about in terms of adding a field to struct mempolicy to remember the intended nodemask the application asked to interleave over. > This lets us get Choice B out there, and lets the two main libraries, > libnuma and libcpuset, dynamically adapt to whichever Choice is active > for the current task. > > Unchanged applications and existing binaries would simply continue with > Choice A. With one additional line of code, a user application could > get Choice B, with its ability for example to request MPOL_INTERLEAVE > over all cpuset allowed nodes, where the kernel automatically adapts > that to changing cpuset changes from larger 'mems' to smaller 'mems' > and back to larger 'mems' again. > You don't actually need to choose between the two choices for adapting MPOL_INTERLEAVE over _all_ allowed cpuset nodes. I thought what we agreed upon and what you were going to implement was adding a nodemask_t to struct mempolicy for the intended nodemask of the memory policy and then AND it with pol->cpuset_mems_allowed. That completely satisfies my needs and my applications that want to allocate over all available nodes (by simply passing numa_all_nodes to set_mempolicy(MPOL_INTERLEAVE)). If I wanted to interleave only over a subset, the choices would matter. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists