lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51DC3242.30802@suse.de>
Date:	Tue, 09 Jul 2013 17:54:42 +0200
From:	Alexander Graf <agraf@...e.de>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Cc:	Alexey Kardashevskiy <aik@...abs.ru>,
	linuxppc-dev@...ts.ozlabs.org,
	David Gibson <david@...son.dropbear.id.au>,
	Paul Mackerras <paulus@...ba.org>,
	Alex Williamson <alex.williamson@...hat.com>,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
	kvm-ppc@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
	Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>,
	Christoffer Dall <cdall@...columbia.edu>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org
Subject: Re: [PATCH 4/8] powerpc: Prepare to support kernel handling of IOMMU
 map/unmap

On 07/08/2013 03:33 AM, Benjamin Herrenschmidt wrote:
> On Sun, 2013-07-07 at 01:07 +1000, Alexey Kardashevskiy wrote:
>> The current VFIO-on-POWER implementation supports only user mode
>> driven mapping, i.e. QEMU is sending requests to map/unmap pages.
>> However this approach is really slow, so we want to move that to KVM.
>> Since H_PUT_TCE can be extremely performance sensitive (especially with
>> network adapters where each packet needs to be mapped/unmapped) we chose
>> to implement that as a "fast" hypercall directly in "real
>> mode" (processor still in the guest context but MMU off).
>>
>> To be able to do that, we need to provide some facilities to
>> access the struct page count within that real mode environment as things
>> like the sparsemem vmemmap mappings aren't accessible.
>>
>> This adds an API to increment/decrement page counter as
>> get_user_pages API used for user mode mapping does not work
>> in the real mode.
>>
>> CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.
> This patch will need an ack from "mm" people to make sure they are ok
> with our approach and ack the change to the generic header.
>
> (Added linux-mm).
>
> Cheers,
> Ben.
>
>> Reviewed-by: Paul Mackerras<paulus@...ba.org>
>> Signed-off-by: Paul Mackerras<paulus@...ba.org>
>> Signed-off-by: Alexey Kardashevskiy<aik@...abs.ru>
>>
>> ---
>>
>> Changes:
>> 2013/06/27:
>> * realmode_get_page() fixed to use get_page_unless_zero(). If failed,
>> the call will be passed from real to virtual mode and safely handled.
>> * added comment to PageCompound() in include/linux/page-flags.h.
>>
>> 2013/05/20:
>> * PageTail() is replaced by PageCompound() in order to have the same checks
>> for whether the page is huge in realmode_get_page() and realmode_put_page()
>>
>> Signed-off-by: Alexey Kardashevskiy<aik@...abs.ru>
>> ---
>>   arch/powerpc/include/asm/pgtable-ppc64.h |  4 ++
>>   arch/powerpc/mm/init_64.c                | 78 +++++++++++++++++++++++++++++++-
>>   include/linux/page-flags.h               |  4 +-
>>   3 files changed, 84 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
>> index e3d55f6f..7b46e5f 100644
>> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
>> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
>> @@ -376,6 +376,10 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
>>   }
>>   #endif /* !CONFIG_HUGETLB_PAGE */
>>
>> +struct page *realmode_pfn_to_page(unsigned long pfn);
>> +int realmode_get_page(struct page *page);
>> +int realmode_put_page(struct page *page);
>> +
>>   #endif /* __ASSEMBLY__ */
>>
>>   #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
>> index a90b9c4..7031be3 100644
>> --- a/arch/powerpc/mm/init_64.c
>> +++ b/arch/powerpc/mm/init_64.c
>> @@ -297,5 +297,81 @@ void vmemmap_free(unsigned long start, unsigned long end)
>>   {
>>   }
>>
>> -#endif /* CONFIG_SPARSEMEM_VMEMMAP */
>> +/*
>> + * We do not have access to the sparsemem vmemmap, so we fallback to
>> + * walking the list of sparsemem blocks which we already maintain for
>> + * the sake of crashdump. In the long run, we might want to maintain
>> + * a tree if performance of that linear walk becomes a problem.
>> + *
>> + * Any of realmode_XXXX functions can fail due to:
>> + * 1) As real sparsemem blocks do not lay in RAM continously (they
>> + * are in virtual address space which is not available in the real mode),
>> + * the requested page struct can be split between blocks so get_page/put_page
>> + * may fail.
>> + * 2) When huge pages are used, the get_page/put_page API will fail
>> + * in real mode as the linked addresses in the page struct are virtual
>> + * too.
>> + * When 1) or 2) takes place, the API returns an error code to cause
>> + * an exit to kernel virtual mode where the operation will be completed.

I don't see where these functions enter kernel virtual mode. I think 
it's best to just remove the last sentence. It doesn't belong here.


Alex

>> + */
>> +struct page *realmode_pfn_to_page(unsigned long pfn)
>> +{
>> +	struct vmemmap_backing *vmem_back;
>> +	struct page *page;
>> +	unsigned long page_size = 1<<  mmu_psize_defs[mmu_vmemmap_psize].shift;
>> +	unsigned long pg_va = (unsigned long) pfn_to_page(pfn);
>>
>> +	for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
>> +		if (pg_va<  vmem_back->virt_addr)
>> +			continue;
>> +
>> +		/* Check that page struct is not split between real pages */
>> +		if ((pg_va + sizeof(struct page))>
>> +				(vmem_back->virt_addr + page_size))
>> +			return NULL;
>> +
>> +		page = (struct page *) (vmem_back->phys + pg_va -
>> +				vmem_back->virt_addr);
>> +		return page;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
>> +
>> +#elif defined(CONFIG_FLATMEM)
>> +
>> +struct page *realmode_pfn_to_page(unsigned long pfn)
>> +{
>> +	struct page *page = pfn_to_page(pfn);
>> +	return page;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
>> +
>> +#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
>> +
>> +#if defined(CONFIG_SPARSEMEM_VMEMMAP) || defined(CONFIG_FLATMEM)
>> +int realmode_get_page(struct page *page)
>> +{
>> +	if (PageCompound(page))
>> +		return -EAGAIN;
>> +
>> +	if (!get_page_unless_zero(page))
>> +		return -EAGAIN;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_get_page);
>> +
>> +int realmode_put_page(struct page *page)
>> +{
>> +	if (PageCompound(page))
>> +		return -EAGAIN;
>> +
>> +	if (!atomic_add_unless(&page->_count, -1, 1))
>> +		return -EAGAIN;
>> +
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(realmode_put_page);
>> +#endif
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 6d53675..98ada58 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -329,7 +329,9 @@ static inline void set_page_writeback(struct page *page)
>>    * System with lots of page flags available. This allows separate
>>    * flags for PageHead() and PageTail() checks of compound pages so that bit
>>    * tests can be used in performance sensitive paths. PageCompound is
>> - * generally not used in hot code paths.
>> + * generally not used in hot code paths except arch/powerpc/mm/init_64.c
>> + * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
>> + * and avoid handling those in real mode.
>>    */
>>   __PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
>>   __PAGEFLAG(Tail, tail)
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ