lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 11 Nov 2010 11:28:04 -0700
From:	Myron Stowe <myron.stowe@...com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Bjorn Helgaas <bjorn.helgaas@...com>, Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Venkatesh Pallipadi <venki@...gle.com>,
	Nikhil Rao <ncrao@...gle.com>,
	Takuya Yoshikawa <yoshikawa.takuya@....ntt.co.jp>,
	linux-kernel@...r.kernel.org, knikanth@...e.de, rjenties@...gle.com
Subject: Re: divide error in select_task_rq_fair()

On Fri, 2010-11-05 at 07:17 +0100, Eric Dumazet wrote:
> Le jeudi 04 novembre 2010 à 20:00 -0600, Bjorn Helgaas a écrit :
> 
> > Is that going to help you debug the problem?  The solution is not going
> > to be something like "set NR_CPUS=x".  If NR_CPUS is too small, the
> > machine should still *boot*, even if we can't use all the CPUs in the
> > box.
> > 
> 
> Yes, it will help to understand the layout of cpu / domains and make
> appropriate changes.
> 
> Alternative is you send me such a machine :=)

I opened a BZ on this issue as it seems to be a regression -
https://bugzilla.kernel.org/show_bug.cgi?id=22662

I also, as indicated in the BZ, bisected the kernel which gave the
following results and reverting 50f2d7f682f9c0ed58191d0982fe77888d59d162
did re-enable booting on the box in question (an HP dl980g7).  Let me
know what further info you need or patches to test for debugging this.

Thanks,

commit 50f2d7f682f9c0ed58191d0982fe77888d59d162
Author: Nikanth Karthikesan <knikanth@...e.de>
Date:   Thu Sep 30 17:34:10 2010 +0530

    x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA

    commit d9c2d5ac6af87b4491bff107113aaf16f6c2b2d9 "x86, numa: Use near(er)
    online node instead of roundrobin for NUMA" changed NUMA initialization on
    Intel to choose the nearest online node or first node.  Fake NUMA would be
    better of with round-robin initialization, instead of the all CPUS on
    first node.  Change the choice of first node, back to round-robin.

    For testing NUMA kernel behaviour without cpusets and NUMA aware
    applications, it would be better to have cpus in different nodes, rather
    than all in a single node.  With cpusets migration of tasks scenarios
    cannot not be tested.

    I guess having it round-robin shouldn't affect the use cases for all cpus
    on the first node.

    The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
    be the case, which was changed by commit d9c2d5ac6.  It changed from
    roundrobin to nearer or first node.  And I couldn't find any reason for
    this change in its changelog.

    Signed-off-by: Nikanth Karthikesan <knikanth@...e.de>
    Cc: David Rientjes <rientjes@...gle.com>
    Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> 
> Thanks
>  
> 


-- 
Myron Stowe                             Linux Kernel Developer
Fort Collins, CO                        Office of Corporate Strategy and Technology

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ