lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57FED0B1.3010506@zoho.com>
Date:   Thu, 13 Oct 2016 08:09:21 +0800
From:   zijun_hu <zijun_hu@...o.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org, zijun_hu@....com,
        tj@...nel.org, cl@...ux.com
Subject: Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON()
 falsely

On 10/13/2016 05:41 AM, Andrew Morton wrote:
> On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu <zijun_hu@...o.com> wrote:
> 
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> Under what circumstances is the triggered?  In other words, what are
> the end-user visible effects of the fix?
> 

the BUG_ON() takes effect when the number of CPUs isn't aligned @upa,
the BUG_ON() should not be triggered under this normal circumstances.
the aim of this fixing is prevent the BUG_ON() which is triggered under
the case.

see below original code segments for reason.
pcpu_build_alloc_info(){
...

	for_each_possible_cpu(cpu)
		if (group_map[cpu] == group)
			gi->cpu_map[gi->nr_units++] = cpu;
	gi->nr_units = roundup(gi->nr_units, upa);

calculate the number of CPUs belonging to a group into relevant @gi->nr_units
then roundup @gi->nr_units up to @upa for itself

	unit += gi->nr_units;
...
}

pcpu_page_first_chunk() {
...
	ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
	if (IS_ERR(ai))
		return PTR_ERR(ai);
	BUG_ON(ai->nr_groups != 1);
	BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

it seems there is only one group and all CPUs belong to the group
but compare the number of CPUs with the number of units directly.
...
}

as shown by comments in above function. ai->groups[0].nr_units
should equal to roundup(num_possible_cpus(), @upa) other than
num_possible_cpus() directly.


> I mean, this is pretty old code (isn't it?) so what are you doing that
> triggers this?
> 
> 
i am learning memory management source and find the inconsistency and think
the BUG_ON() maybe be triggered under this special normal but possible case
it maybe a logic error 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ