linux-kernel - Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <D913ABBF-359A-4C36-8471-C3E5F6EA6BF8@gmail.com>
Date:	Wed, 5 Aug 2015 11:37:40 +0800
From:	yalin wang <yalin.wang2010@...il.com>
To:	Dave Hansen <dave@...1.net>
Cc:	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>, fabf@...net.be,
	bhe@...hat.com, open list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] kcore:change kcore_read to make sure the kernel read is safe


> On Aug 5, 2015, at 05:18, Dave Hansen <dave@...1.net> wrote:
> 
> On 08/03/2015 08:37 PM, yalin wang wrote:
>> This change kcore_read() to use __copy_from_user_inatomic() to
>> copy data from kernel address, because kern_addr_valid() just make sure
>> page table is valid during call it, whne it return, the page table may
>> change, for example, like set_fixmap() function will change kernel page
>> table, then maybe trigger kernel crash if encounter this unluckily.
> 
> I don't see any cases at the moment that will crash.  set_fixmap()
> doesn't ever clear out any ptes, right?
> 
> I guess the root problem here is that we don't have any good (generic)
> locking of kernel page tables inside the linear map.  Can you come up
> with a case where this will _actually_ crash?
> 
Thanks for your comments.
i don’t have crash for this, but when i read code, i see this part not safe,
so i make this patch :).

>> fs/proc/kcore.c | 30 ++++++++++++++++++++++++------
>> 1 file changed, 24 insertions(+), 6 deletions(-)
>> 
>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
>> index 92e6726..b085fde 100644
>> --- a/fs/proc/kcore.c
>> +++ b/fs/proc/kcore.c
>> @@ -86,8 +86,8 @@ static size_t get_kcore_size(int *nphdr, size_t *elf_buflen)
>> 			size = try;
>> 		*nphdr = *nphdr + 1;
>> 	}
>> -	*elf_buflen =	sizeof(struct elfhdr) + 
>> -			(*nphdr + 2)*sizeof(struct elf_phdr) + 
>> +	*elf_buflen =	sizeof(struct elfhdr) +
>> +			(*nphdr + 2)*sizeof(struct elf_phdr) +
> 
> I'm having a hard time spotting the change here.  Whitespace?
i  will seperate in another patch for format correctness.
> 
>> 			3 * ((sizeof(struct elf_note)) +
>> 			     roundup(sizeof(CORE_STR), 4)) +
>> 			roundup(sizeof(struct elf_prstatus), 4) +
>> @@ -435,6 +435,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 	size_t elf_buflen;
>> 	int nphdr;
>> 	unsigned long start;
>> +	unsigned long page = 0;
>> 
>> 	read_lock(&kclist_lock);
>> 	size = get_kcore_size(&nphdr, &elf_buflen);
>> @@ -485,7 +486,7 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 	start = kc_offset_to_vaddr(*fpos - elf_buflen);
>> 	if ((tsz = (PAGE_SIZE - (start & ~PAGE_MASK))) > buflen)
>> 		tsz = buflen;
>> -		
>> +
> 
> Please keep the unnecessary whitespace changes for another patch.
> 
>> 	while (buflen) {
>> 		struct kcore_list *m;
>> 
>> @@ -515,15 +516,32 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos)
>> 		} else {
>> 			if (kern_addr_valid(start)) {
>> 				unsigned long n;
>> +				mm_segment_t old_fs = get_fs();
>> +
>> +				if (page == 0) {
>> +					page = __get_free_page(GFP_KERNEL);
>> +					if (page == 0)
>> +						return -ENOMEM;
> 
> FWIW, we usually code this as "!page" instead of "page == 0".  I also
> wouldn't call it 'page'.
> 
> Also, why is this using a raw __get_free_page() while the code above it
> uses a kmalloc()?
> 
because i am using a page size buffer, more efficient  to use __get_free_page()
than  kmalloc() here .

>> -				n = copy_to_user(buffer, (char *)start, tsz);
>> +				}
>> +				set_fs(KERNEL_DS);
>> +				pagefault_disable();
>> +				n = __copy_from_user_inatomic((void *)page,
>> +					(__force const void __user *)start,
>> +					tsz);
>> +				pagefault_enable();
>> +				set_fs(old_fs);
>> +				if (n)
>> +					memset((void *)page + tsz - n, 0, n);
>> +
>> +				n = copy_to_user(buffer, (char *)page, tsz);
> 
> So, first of all, we are using __copy_from_user_inatomic() to copy to
> and from a *kernel* addresses, and it doesn't even get a comment? :)
> 
i will add comment.
> Fundamentally, we're trying to be able to safely survive faults in the
> kernel linear map here.  I think we've got to get a better handle on
> when that happens rather than just paper over it when it does.  (Aside:
> There might actually be a missing use of get_online_mems() here.)
> 
ok.

> Maybe we should just be walking the kernel page tables ourselves and do
> a kmap().  We might have a stale pte but we don't have to worry about
> actual racy updates while we are doing the copy.
> 
so if do like this, we can remove kern_addr_valid() function, and i just walk pte and use get_page_unelss_zero()
to grab the valid page  ?


>> 				/*
>> 				 * We cannot distinguish between fault on source
>> 				 * and fault on destination. When this happens
>> 				 * we clear too and hope it will trigger the
>> 				 * EFAULT again.
>> 				 */
> 
> This comment seems wrong after the patch.
Ok.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/