lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 1 Aug 2006 12:00:02 -0700 From: "Siddha, Suresh B" <suresh.b.siddha@...el.com> To: Paul Jackson <pj@....com> Cc: "Siddha, Suresh B" <suresh.b.siddha@...el.com>, mingo@...e.hu, nickpiggin@...oo.com.au, vatsa@...ibm.com, Simon.Derr@...l.net, steiner@....com, linux-kernel@...r.kernel.org, akpm@...l.org Subject: Re: [BUG] sched: big numa dynamic sched domain memory corruption On Tue, Aug 01, 2006 at 01:25:33AM -0700, Paul Jackson wrote: > I wish you well on any further code improvements you have planned for > this code. It's tough to understand, with such issues as many #ifdef's, > an interesting memory layout of the key sched domain arrays that I > didn't see described much in the comments, and a variety of memory > allocation calls that are tough to unravel on error. Portions of > the code could use some more comments, explaining what is going on. > For example, I still haven't figured exactly what 'cpu_power' means. I will add some info to Documentation/sched-domains.txt aswell as some comments to the code where appropriate. I did some cleanup of the code but unfortunately that got dropped because of some issues. I will repost that cleanup patch aswell. > > The allocations of sched_group_allnodes, sched_group_phys and > sched_group_core are -big- on our ia64 SN2 systems (1024 CPUs), > and could fail once a system has been up for a while and is > getting memory tight and fragmented. I have to agree with you. I have an idea(basically passing cpu_map info to functions which determine the group) to solve this issue. Let me work on it and post a fix. > It is not obvious to me from the code or comments just how sched > domains are arranged on various large systems with hyper-threading > (SMT) and/or multiple cores (MC) and/or multiple processor packages > per node, and how scheduling is affected by all this. Enabling SCHED_DOMAIN_DEBUG should atleast show how sched domains and groups are arranged. Adding an example in Documentation might be a good idea. > > This was about the third bug that has come by in it -- which I > in particular notice when it is someone playing with cpu_exclusive > cpusets who hits the bug. Any kernel backtrace with 'cpuset' buried in > it tends to migrate to my inbox. This latest bug was particularly > nasty, as is usually the case with random memory corruption bugs, > costing us a bunch of hours. > > Good luck. > > If you are aware of any other fixes/patches besides the above that us > big honkin numa iron SLES10 users need for reliable operation, let me > know. Will keep you in loop. thanks, suresh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists