lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Oct 2016 08:05:33 +0800
From:   zijun_hu <zijun_hu@...o.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org, zijun_hu@....com,
        tj@...nel.org, cl@...ux.com
Subject: Re: [RFC v2 PATCH] mm/percpu.c: fix panic triggered by BUG_ON()
 falsely

On 10/13/2016 05:41 AM, Andrew Morton wrote:
> On Tue, 11 Oct 2016 22:00:28 +0800 zijun_hu <zijun_hu@...o.com> wrote:
> 
>> as shown by pcpu_build_alloc_info(), the number of units within a percpu
>> group is educed by rounding up the number of CPUs within the group to
>> @upa boundary, therefore, the number of CPUs isn't equal to the units's
>> if it isn't aligned to @upa normally. however, pcpu_page_first_chunk()
>> uses BUG_ON() to assert one number is equal the other roughly, so a panic
>> is maybe triggered by the BUG_ON() falsely.
>>
>> in order to fix this issue, the number of CPUs is rounded up then compared
>> with units's, the BUG_ON() is replaced by warning and returning error code
>> as well to keep system alive as much as possible.
> 
> Under what circumstances is the triggered?  In other words, what are
> the end-user visible effects of the fix?
> 
the BUG_ON() takes effect when the number isn't aligned @upa, the BUG_ON()
should not be triggered under this normal circumstances.
the aim of this fixing is prevent the BUG_ON() which is triggered under
the case.

see below original code segments for reason.
pcpu_build_alloc_info(){
...

	for_each_possible_cpu(cpu)
		if (group_map[cpu] == group)
			gi->cpu_map[gi->nr_units++] = cpu;
	gi->nr_units = roundup(gi->nr_units, upa);

calculate the number of CPUs belonging to a group into relevant @gi->nr_units
then roundup @gi->nr_units up to @upa for itself

unit += gi->nr_units;
...
}

pcpu_page_first_chunk() {
...
	ai = pcpu_build_alloc_info(reserved_size, 0, PAGE_SIZE, NULL);
	if (IS_ERR(ai))
		return PTR_ERR(ai);
	BUG_ON(ai->nr_groups != 1);
	BUG_ON(ai->groups[0].nr_units != num_possible_cpus());

it seems there is only one group and all CPUs belong to the group
but compare the number of CPUs with the number of units directly.

as shown by comments in above function. ai->groups[0].nr_units
should equal to roundup(num_possible_cpus(), @upa) other than
num_possible_cpus() directly.
...
}

> I mean, this is pretty old code (isn't it?) so what are you doing that
> triggers this?
> 
> 
i am learning memory source and find the inconsistency and think
the BUG_ON() maybe be triggered under this special normal but possible case

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ