[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100705131835.GN10072@secunet.com>
Date: Mon, 5 Jul 2010 15:18:35 +0200
From: Steffen Klassert <steffen.klassert@...unet.com>
To: Dan Kruchinin <dkruchinin@....org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: [PATCH] Fixed division by zero bug in kernel/padata.c
On Fri, Jul 02, 2010 at 05:24:13PM +0400, Dan Kruchinin wrote:
> No problem. Here is fixed patch:
> --
> When boot CPU(typically CPU #0) is excluded from padata cpumask and
> user enters halt command from console, kernel faults on division by zero;
> This occurs because during the halt kernel shuts down each non-boot CPU one
> by one. After it shuts down the last CPU that is set in the padata cpumask,
> the only working CPU in the system is a boot CPU(#0) and it's the only CPU that
> is set in the cpu_active_mask. Hence when padata_cpu_callback calls
> __padata_remove_cpu(and hence padata_alloc_pd) it appears that padata
> cpumask and
> cpu_active mask aren't intersect. Hence the following code in
> padata_alloc_pd causes
> a DZ error exception:
> cpumask_and(pd->cpumask, cpumask, cpu_active_mask); // pd->cpumask
> will be empty
> ...
> num_cpus = cpumask_weight(pd->cpumask); // num_cpus = 0
> pd->max_seq_nr = (MAX_SEQ_NR / num_cpus) * num_cpus - 1; // DZ!
>
I'm still thinking about how to handle an empty cpumask here.
While your patch would be ok to handle the shutdown case you
noticed, the problem is a bit more complex as soon as we are
able to change the cpumasks from userspace with your patches.
Essentially, we can end up with an empty cpumask here because
of two reasons:
1. A user removed the last cpu that belongs to the padata
cpumask and the active cpumask.
2. The last cpu that belongs to the padata cpumask and the
active cpumask goes offline.
In the first case it would be ok to tell the user that this is
an invalid operation by returning an error. In the second case
we can't just return an error to the cpu hotplug callback function,
because it returns NOTIFY_BAD on error. This means, that it depends
on the padata user configuration whether a cpu can go offline or not.
This is certainly not what we want to have.
Both cases should be handled in the same way. So we could just
stop the instance if the cpumasks do not intersect, and enable
it as soon as they do intersect again. The padata instance would
refuse to do anything as long as the cpumasks do not intersect,
but it is still in a consistent state. Let me add the infrastructure
to handle this, then you can use it with your patches.
Thanks,
Steffen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists