linux-kernel - Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding a CONFIG_RELOCATABLE kernel with CONFIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 20 Jul 2010 08:29:18 -0500
From:	Milton Miller <miltonm@....com>
To:	Alexander Graf <agraf@...e.de>
Cc:	Subrata Modak <subrata@...ux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [PPC64/Power7 - 2.6.35-rc5] Bad relocation warnings whileBuilding
 a CONFIG_RELOCATABLE kernel with CONFIG_ISERIES enabled



On Tue, 20 Jul 2010 about 01:37:27 -0600 Alexander Graf wrote:
> On 20.07.2010, at 09:27, Milton Miller wrote:
>> On Mon, 19 Jul 2010 about 14:00:56 +0200, Alexander Graf wrote:
>>> Milton Miller wrote:
>>>> I wrote:
>>>>
>>>> Oh yea, and for book-3s, the code copies from 0x100 to __end_interrupts
>>>> in arch/powerpc/kernel/exceptions-64s.h down to the real 0, but the rest
>>>> of the kernel is at some disjointed address.  The interrupt will go to
>>>> the copy at the real zero.  Any references to code outside that region
>>>> must be done via a full indrect branch (not a relative one), simiar to
>>>> the secondary startup (via following the function pointer in a descriptor
>>>> set in very low memory), or syscall entry and exception vectors via paca.
>>>>
>>>
>>> That would still break on normal PPC boxes, as any address accessed in
>>> real mode has to be inside the RMA. And the #include for
>>> kvm/book3s_rmhandlers.S happens after __end_interrupts. So I'd end up
>>> with code that gets executed outside of the RMA after a relocation, right?
>>>
>>> Alex
>>>
>>
>> Weither its outside of the RMA or not, DO_KVM is creating a branch outside
>> of code copied to lowmem.
>>
>> This is BROKEN.
>> 
>> We have a hard limit that we can't extend _end_interrupts
>> past 0x7000, and a soft limit that we can't exceed 0x6000.
>> If there is space, we could move the real mode handler
>> extensions inside end_interrupts in exceptions-64s.S,
>> and store the full address in a .quad so it gets relocated
>> properly.  Don't subtract the start, we have designed the
>> kernel to run with start at a VA that can be used as a EA
>> in real mode.
>> 
>
> Moving everything to exceptions-64s.S sounds like the best thing to
> do. All the code in real mode really is there so it stays inside the
> RMA. I don't think we can guarantee that for any code that is not
> copied, right?

I agree its the right approach.  We aren't that strict today in
general but that is a the design point (we would have to modifiy
kexec-tools purgatory to skip checksums of said areas).  The big
rule is no direct branching in or out of code before end_interrupts
with CONFIG_RELOCATBLE defined, which this code does.  The second
rule is only 64 bit relative symbols are allowed.


>> Otherwise we need to mark KVM_BOOK3S_64 depends on (!RELOCATABLE ||
>> BROKEN) for 2.6.35 until we get fixes.
>
> Well - it's only broken when really getting relocated. But I agree,
> the current state doesn't cope with Linux's relocation logic.

If one were to try to relocate a kernel today it would crash without
activating KVM so I think its deserved.

>> I took a read though the book3s code as of 2.6.34.   A few things
>> I noticed:
>>
>> (1) The code is using slb large to control the segment size.   It
>> should be using SLB B field (or just impliment 256M segments only).
>
> I'm not sure I understand this part? We only use 256MB segments for now.

see for instance kvmppc_mmu_book3s_64_find_slbe, which is clearly
doing 1TB compares.

>
>> (2) It appears that the mtspr and mfspr code is using the same storage for
>> bats 4-7 as 0-3 ... I would have expected a 4 + a few places.
>
> Yes, that one is fixed in more recent versions already.
>
>> (3) Its not clear to me that you clear RI when transitioning to the guest
>> but its obviously required because you place state in srr0 & srr1.
>
> Uh - do I have to clear RI? I'm not prepared to take an interrupt
> anyways and RI is just a soft flag for Linux's handlers, right?

No, its a hardware assist managed flag so that we know a machine
check or system reset interrupt is not recoverable.  A machine
check can happen at any time.

Such a machine check would be especially unrecoverable because
the code is abusing the machine check save area with kvm state.
You already are extending the paca to hold the slb, why not have
dedicated save area which can save the registers you want?

Also the split of the code between interrupts, rmhandlers, and slb
(especially the last two) makes it hard for one not familiar with
the code.


>
>> (4) I don't understand why __kvmppc_vcpu_run turns on interrupts so that
>> __kvmppc_vcpu_entry can turn them back off.   Something to do with
>> irq trace annotations?
>
> __kvmppc_vcpu_run turns on soft interrupts while __kvmppc_vcpu_entry
> turns them off in MSR. This is so that when enabling interrupts again
> on guest exit, we have the soft enable bit set.

That at least needs some commenting, and perhaps could use a more
direct fixup.

milton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/