[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201006281957.34403.rusty@rustcorp.com.au>
Date: Mon, 28 Jun 2010 19:57:33 +0930
From: Rusty Russell <rusty@...tcorp.com.au>
To: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc: Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
Arnd Bergmann <arnd@...db.de>, anton@...ba.org,
Mike Travis <travis@....com>
Subject: Re: [PATCH 5/5] cpumask: reduce cpumask_size
On Mon, 28 Jun 2010 12:42:23 pm KOSAKI Motohiro wrote:
> > Now we're sure noone is using old cpumask operators, nor *cpumask, we can
> > allocate less bits safely. This reduces the memory usage of off-stack
> > cpumasks when CONFIG_CPUMASK_OFFSTACK=y but we don't have NR_CPUS actual
> > cpus.
>
> I have to say I'm sorry. Probably I broke your assumption.
> If this patch applied, we reintroduce exposing nr_cpu_ids issue and
> break libnuma again. I think following change is necessary too.
>
> Or, Am I missing something?
I cc'd you because I remembered you being involved in that libnuma issue
and couldn't remember the details.
Unfortunately, this solution doesn't work:
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 18faf4d..c14acad 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4823,7 +4823,9 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
>
> ret = sched_getaffinity(pid, mask);
> if (ret == 0) {
> - size_t retlen = min_t(size_t, len, cpumask_size());
> + size_t retlen = min_t(size_t, len,
> + BITS_TO_LONGS(NR_CPUS) * sizeof(long));
>
Since mask is a cpumask_var_t, only cpumask_size() is allocated. We can't
copy NR_CPUS bits.
But I think it's OK, anyway. libnuma is broken because it gets upset if the
number of cpus it reads from /sys/.../cpumap is more than the cpumask size
returned from sys_sched_getaffinity.
Currently, getaffinity returns cpumask_size() (ie. based on NR_CPUS), and
the printing routines use nr_cpumask_bits (ie. based on NR_CPUS for
!CPUMASK_OFFSTACK, nr_cpu_ids for CPUMASK_OFFSTACK).
(libnuma is OK on CONFIG_CPUMASK_OFFSTACK=y because the sysfs output is
*shorter* than expected. I checked the code).
With this patch, cpumask_size() becomes based on nr_cpumask_bits, so both
getaffinity and sysfs are using the same basis.
Do you agree?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists