linux-kernel - Re: Linux 2.6.39-rc3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DA63B92.2000300@kernel.org>
Date:	Wed, 13 Apr 2011 17:10:58 -0700
From:	Yinghai Lu <yinghai@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Joerg Roedel <joro@...tes.org>, Ingo Molnar <mingo@...e.hu>,
	Alex Deucher <alexdeucher@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	dri-devel@...ts.freedesktop.org, "H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>, Tejun Heo <tj@...nel.org>
Subject: Re: Linux 2.6.39-rc3

On 04/13/2011 04:39 PM, Linus Torvalds wrote:
> On Wed, Apr 13, 2011 at 2:23 PM, Yinghai Lu <yinghai@...nel.org> wrote:
>>>
>>> What are all the magic numbers, and why would 0x80000000 be special?
>>
>> that is the old value when kernel was doing bottom-up bootmem allocation.
> 
> I understand, BUT THAT IS STILL A TOTALLY MAGIC NUMBER!
> 
> It makes it come out the same ON THAT ONE MACHINE.  So no, it's not
> "the old value". It's a random value that gets the old value in one
> specific case.

Alexandre's system is working 2.6.38.2 and kernel allocate from 0xa4000000
Joerg's system working 2.6.39-rc3 while revert the top down bootmem patch 
	1a4a678b12c84db9ae5dce424e0e97f0559bb57c
and kernel allocate to 0x80000000.
Alexandre's system is working while increasing alignment to 1g, and make kernel to
allocate 0x80000000 to gart.

they are not working if kernel allocate from 0xa0000000

the 0xa0000000 looks like same value from radon GTT.


[    4.250159] radeon 0000:01:05.0: VRAM: 320M 0x00000000C0000000 - 0x00000000D3FFFFFF (320M used)
[    4.258830] radeon 0000:01:05.0: GTT: 512M 0x00000000A0000000 - 0x00000000BFFFFFFF
[    4.266742] [drm] Detected VRAM RAM=320M, BAR=256M
[    4.271549] [drm] RAM width 32bits DDR
[    4.275435] [TTM] Zone  kernel: Available graphics memory: 1896526 kiB.
[    4.282066] [TTM] Initializing pool allocator.
[    4.282085] usb 7-2: new full speed USB device number 2 using ohci_hcd
[    4.293076] [drm] radeon: 320M of VRAM memory ready
[    4.298277] [drm] radeon: 512M of GTT memory ready.
[    4.303218] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    4.309854] [drm] Driver supports precise vblank timestamp query.
[    4.315970] [drm] radeon: irq initialized.
[    4.320094] [drm] GART: num cpu pages 131072, num gpu pages 131072

Alex said that 0xa0000000 is ok and is from GPU address space
---
The VRAM and GTT addresses in the dmesg are internal GPU addresses not
system addresses.  The GPU has it's own internal address space for
on-chip memory clients (texture samplers, render buffers, display
controllers, etc.).  The GPU sets up two apertures in it's internal
address space and on-chip client requests are forwarded to the
appropriate place by the GPU's memory controller.  Addresses in the
GPU's VRAM aperture go to local vram on discrete cards, or to the
stolen memory at the top of system memory for IGP cards.  Addresses in
the GPU's GTT aperture hit a page table and get forwarded to the
appropriate dma pages.
---

> 
>>> Why don't we write code that just works?
>>>
>>> Or absent a "just works" set of patches, why don't we revert to code
>>> that has years of testing?
>>>
>>> This kind of "I broke things, so now I will jiggle things randomly
>>> until they unbreak" is not acceptable.
>>>
>>> Either explain why that fixes a real BUG (and why the magic constants
>>> need to be what they are), or just revert the patch that caused the
>>> problem, and go back to the allocation patters that have years of
>>> experience.
>>>
>>> Guys, we've had this discussion before, in PCI allocation. We don't do
>>> this. We tried switching the PCI region allocations to top-down, and
>>> IT WAS A FAILURE. We reverted it to what we had years of testing with.
>>>
>>> Don't just make random changes. There really are only two acceptable
>>> models of development: "think and analyze" or "years and years of
>>> testing on thousands of machines". Those two really do work.
>>
>> We did do the analyzing, and only difference seems to be:
> 
> No.
> 
> Yinghai, we have had this discussion before, and dammit, you need to
> understand the difference between "understanding the problem" and "put
> in random values until it works on one machine".
> 
> There was absolutely _zero_ analysis done. You do not actually
> understand WHY the numbers matter. You just look at two random
> numbers, and one works, the other does not. That's not "analyzing".
> That's just "random number games".
> 
> If you cannot see and understand the difference between an actual
> analytical solution where you _understand_ what the code is doing and
> why, and "random numbers that happen to work on one machine", I don't
> know what to tell you.
> 
>> good one is using 0x80000000
>> and bad one is using 0xa0000000.
>>
>> We try to figure out if it needs low address and it happen to work
>> because kernel was doing bottom up allocation.
> 
> No.
> 
> Let me repeat my point one more time.
> 
> You have TWO choices. Not more, not less:
> 
>  - choice #1: go back to the old allocation model. It's tested. It
> doesn't regress. Admittedly we may not know exactly _why_ it works,
> and it might not work on all machines, but it doesn't cause
> regressions (ie the machines it doesn't work on it _never_ worked on).
> 
>    And this doesn't mean "old value for that _one_ machine". It means
> "old value for _every_ machine". So it means we revert the whole
> bottom-down thing entirely. Not just "change one random number so that
> the totally different allocation pattern happens to give the same
> result on one particular machine".
> 
>    Quite frankly, I don't see the point of doing top-to-bottom anyway,
> so I think we should do this regardless. Just revert the whole
> "allocate from top". It didn't work for PCI, it's not working for this
> case either. Stop doing it.

we did some codes to prevent bootmem to use low range.

> 
>  - Choice #2: understand exactly _what_ goes wrong, and fix it
> analytically (ie by _understanding_ the problem, and being able to
> solve it exactly, and in a way you can argue about without having to
> resort to "magic happens").
> 
> Now, the whole analytic approach (aka "computer sciency" approach),
> where you can actually think about the problem without having any
> pesky "reality" impact the solution is obviously the one we tend to
> prefer. Sadly, it's seldom the one we can use in reality when it comes
> to things like resource allocation, since we end up starting off with
> often buggy approximations of what the actual hardware is all about
> (ie broken firmware tables).
> 
> So I'd love to know exactly why one random number works, and why
> another one doesn't. But as long as we do _not_ know the "Why" of it,
> we will have to revert.
> 
> It really is that simple. It's _always_ that simple.
> 
> So the numbers shouldn't be "magic", they should have real
> explanations. And in the absense of real explanation, the model that
> works is "this is what we've always done". Including, very much, the
> whole allocation order. Not just one random number on one random
> machine.

Ok, let's try to figure out why 0xa0000000 can not be used.

if we can not figure out, we can revert

1a4a678b12c84db9ae5dce424e0e97f0559bb57c

thanks

Yinghai 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/