linux-kernel - Re: [patch 2/2] bootmem: Node-setup agnostic free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 15 Apr 2008 13:51:08 +0200
From:	Johannes Weiner <hannes@...urebad.de>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"Yinghai Lu" <yhlu.kernel@...il.com>,
	"Andi Kleen" <andi@...stfloor.org>, linux-kernel@...r.kernel.org,
	"Ingo Molnar" <mingo@...e.hu>,
	"Yasunori Goto" <y-goto@...fujitsu.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>,
	"Christoph Lameter" <clameter@....com>
Subject: Re: [patch 2/2] bootmem: Node-setup agnostic free_bootmem()

Hi,

Andrew Morton <akpm@...ux-foundation.org> writes:

> On Tue, 15 Apr 2008 00:28:34 -0700 "Yinghai Lu" <yhlu.kernel@...il.com> wrote:
>
>> On Tue, Apr 15, 2008 at 12:15 AM, Andrew Morton
>> <akpm@...ux-foundation.org> wrote:
>> >
>> > On Tue, 15 Apr 2008 00:04:03 -0700 "Yinghai Lu" <yhlu.kernel@...il.com> wrote:
>> >
>> >  > On Mon, Apr 14, 2008 at 11:23 PM, Andrew Morton
>> >  > <akpm@...ux-foundation.org> wrote:
>> >  > >
>> >  > > On Sun, 13 Apr 2008 18:56:57 +0200 Andi Kleen <andi@...stfloor.org> wrote:
>> >  > >
>> >  > >  > Johannes Weiner <hannes@...urebad.de> writes:
>> >  > >  >
>> >  > >  > > Make free_bootmem() look up the node holding the specified address
>> >  > >  > > range which lets it work transparently on single-node and multi-node
>> >  > >  > > configurations.
>> >  > >  >
>> >  > >  > Acked-by: Andi Kleen <andi@...stfloor.org>
>> >  > >  >
>> >  > >  > This is far better than the original change it replaces and which
>> >  > >  > I also objected to in review.
>> >  > >  >
>> >  > >
>> >  > >  So...  do we think these two patches are sufficiently safe and important for
>> >  > >  2.6.25?
>> >  >
>> >  > the patch is wrong
>> >  >
>> >
>> >  The last I saw was this:
>> >
>> >
>> >  On Sun, 13 Apr 2008 12:57:22 +0200 Johannes Weiner <hannes@...urebad.de> wrote:
>> >
>> >  > Hi,
>> >  >
>> >  > "Yinghai Lu" <yhlu.kernel@...il.com> writes:
>> >  >
>> >  > > On Sat, Apr 12, 2008 at 3:33 PM, Johannes Weiner <hannes@...urebad.de> wrote:
>> >  > > ...
>> >
>> > > >
>> >  > > could have chance that bootmem with reserved_early that is crossing
>> >  > > the nodes.
>> >  >
>> >  > Upstream reserve_bootmem_core() would BUG() on a caller trying to cross
>> >  > nodes, so I don't see where this chance could come from.
>> >
>> >  Is that what you're referring to?
>> >
>> >  Was Johannes observation incorrect?  If so, why?
>> 
>> my patch with free_bootmem will make sure free_bootmem_core only free
>> bootmem in the bdata scope.
>> so free_bootmem can handle the cross_node bootmem that is done by
>> reserve_early ( done in another patch, is dropped by you because took
>> Jonannes).
>> 
>> in setup_arch for x86_64 there is one free_bootmem that is used when
>> ramdisk is falled out of ram map. that could be crossed by bootloader
>> and kexec, and kernel or second kernel is memmap=NN@SS to execlue some
>> memory.
>> 
>> anyway that is extrem case, but my patch could handle that.

Has this case ever occured?  If this could become real, I have no
objections to implement a way to handle it (why would I?), but until now
you just said that in some time in the future, this could be useful.

>> 
>> I wonder if any regression caused by my previous patch? maybe on other platform?
>> 
>
> Not that I'm aware of.

It papers over buggy usage of free_bootmem().  If its arguments are
bogus, it will just return again where it BUG()ed out before.  The pages
might be never marked free and therefor never reach the buddy allocator.

> I restored mm-make-reserve_bootmem-can-crossed-the-nodes.patch.  Johannes,
> can you please check 2.6.28-rc8-mm2, see if it looks OK?

I object to the way it is implemented.  If it is really needed, that
should be done properly:

	- remove the double loop over the area on the likely succeeding
          path and unroll the reserving on the unlikely path as it was
          done before.  Better to punish exceptional branches than
	  the working paths.
	- make reserve_bootmem_core be strict with its arguments.  If
	  you want to iterate over the bdata list, you should not just
	  throw every item at the _core functions and let them work it
	  out for themselves.  The correct parameters should be
          calculated in advance and then passed to a strict
          _bootmem_core() function that BUG()s on failure.

But still, Yinghai, you never brought in practical reasons for this
whole thing.  You talked about extreme and theoretical cases and I don't
think that this justifies breaking API or pessimizing code at all.

	Hannes
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/