linux-kernel - Re: [PATCH] powerpc/mm: make NULL pointer deferences explicit on bad page faults.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <276439ce-aad9-e72f-7bc9-c57fb4a59339@c-s.fr>
Date:   Fri, 14 Dec 2018 09:01:52 +0100
From:   Christophe Leroy <christophe.leroy@....fr>
To:     Michael Ellerman <mpe@...erman.id.au>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>
Cc:     linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH] powerpc/mm: make NULL pointer deferences explicit on bad
 page faults.

Hi Michael,

Le 14/12/2018 à 01:57, Michael Ellerman a écrit :
> Hi Christophe,
> 
> You know it's the trivial patches that are going to get lots of review
> comments :)

I'm so happy to get comments.

> 
> Christophe Leroy <christophe.leroy@....fr> writes:
>> As several other arches including x86, this patch makes it explicit
>> that a bad page fault is a NULL pointer dereference when the fault
>> address is lower than PAGE_SIZE
> 
> I'm being pedantic, but it's not necessarily a NULL pointer dereference.
> It might just be a direct access to a low address, eg:
> 
>   char *p = 0x100;
>   *p = 0;
> 
> That's not a NULL pointer dereference.
> 
> But other arches do print this so I guess it's OK to add, and in most
> cases it will be an actual NULL pointer dereference.
> 
> I wonder though if we should use 4096 rather than PAGE_SIZE, given
> that's the actual value other arches are using. We support 256K pages on
> some systems, which is getting quite large.

Those invalid accesses are catched because the first page is marked non 
present or non accessible in the page table, so I thing using PAGE_SIZE 
here is valid regardless of the page size.

Looks like the arches have PAGE_SHIFT ranging from 12 to 16 mainly.

> 
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index d51cf5f4e45e..501a1eadb3e9 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -631,13 +631,16 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>>   	switch (TRAP(regs)) {
>>   	case 0x300:
>>   	case 0x380:
>> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
>> -			"data at address 0x%08lx\n", regs->dar);
>> +		pr_alert("Unable to handle kernel %s for data at address 0x%08lx\n",
>> +			 regs->dar < PAGE_SIZE ? "NULL pointer dereference" :
>> +						 "paging request",
>> +			 regs->dar);
> 
> This is now too long I think, with printk time you get:
> 
> [ 1096.450711] Unable to handle kernel NULL pointer dereference for data at address 0x00000000
> 
> Which is 93 columns. It's true on many systems it doesn't really matter
> any more, but it would still be good if it was shorter.
> 
> I like that on x86 they prefix it with "BUG:", just to avoid any confusion.
> 
> What if we had for the NULL pointer case:
> 
>    BUG: Kernel NULL pointer dereference at 0x00000000
> 
> And for the normal case:
> 
>    BUG: Unable to handle kernel data access at 0x00000000
> 
> Note on the very next line we print:
>    Faulting instruction address: 0xc000000000795cc8
> 
> So there should be no confusion about whether "at" refers to the data
> address or the instruction address.

Agreed

> 
>>   	case 0x400:
>>   	case 0x480:
>> -		printk(KERN_ALERT "Unable to handle kernel paging request for "
>> -			"instruction fetch\n");
>> +		pr_alert("Unable to handle kernel %s for instruction fetch\n",
>> +			 regs->nip < PAGE_SIZE ? "NULL pointer dereference" :
>> +						 "paging request");
> 
> I don't really like using "NULL pointer dereference" here, that
> terminology makes me think of a load/store, I think it confuses things
> rather than making it clearer.
> 
> What about:
> 
>    BUG: Unable to handle kernel instruction fetch at 0x00000000

I think we still need to make it explicit that we jumped there due to a 
NULL function pointer, allthought I don't have a good text idea yet for 
this.

> 
> 
>>   		break;
>>   	case 0x600:
>>   		printk(KERN_ALERT "Unable to handle kernel paging request for "
> 
> It would be good to clean up these other cases as well. They seem to be
> trying to use the "page request for" terminology which leads to them
> being very wordy. I assume that was done to help people grepping kernel
> logs for errors, but I think we should not worry about that if we have
> the "BUG:" prefix.
> 
> So we have:
> 	printk(KERN_ALERT "Unable to handle kernel paging request for "
> 		"unaligned access at address 0x%08lx\n", regs->dar);
> 
> What about:
> 
>    BUG: Unable to handle kernel unaligned access at 0x00000000
> 
> And:
> 	printk(KERN_ALERT "Unable to handle kernel paging request for "
> 		"unknown fault\n");
> 
> What about:
> 
>    BUG: Unable to handle unknown paging fault at 0x00000000
> 
> 
> Thoughts?

Looks like good ideas I'll carry on.

Christophe