lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4cc8284a-6162-8f9d-50b3-e594d46751f8@linux.alibaba.com>
Date:   Fri, 8 Jul 2022 20:13:15 +0800
From:   "guanghui.fgh" <guanghuifeng@...ux.alibaba.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     baolin.wang@...ux.alibaba.com, will@...nel.org,
        akpm@...ux-foundation.org, david@...hat.com, jianyong.wu@....com,
        james.morse@....com, quic_qiancai@...cinc.com,
        christophe.leroy@...roup.eu, jonathan@...ek.ca,
        mark.rutland@....com, thunder.leizhen@...wei.com,
        anshuman.khandual@....com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, rppt@...nel.org,
        geert+renesas@...der.be, ardb@...nel.org, linux-mm@...ck.org,
        yaohongbo@...ux.alibaba.com, alikernel-developer@...ux.alibaba.com
Subject: Re: [PATCH v3] arm64: mm: fix linear mapping mem access performance
 degradation

Thanks.

在 2022/7/2 1:24, Catalin Marinas 写道:
> On Thu, Jun 30, 2022 at 06:50:22PM +0800, Guanghui Feng wrote:
>> +static void init_pmd_remap(pud_t *pudp, unsigned long addr, unsigned long end,
>> +			   phys_addr_t phys, pgprot_t prot,
>> +			   phys_addr_t (*pgtable_alloc)(int), int flags)
>> +{
>> +	unsigned long next;
>> +	pmd_t *pmdp;
>> +	phys_addr_t map_offset;
>> +	pmdval_t pmdval;
>> +
>> +	pmdp = pmd_offset(pudp, addr);
>> +	do {
>> +		next = pmd_addr_end(addr, end);
>> +
>> +		if (!pmd_none(*pmdp) && pmd_sect(*pmdp)) {
>> +			phys_addr_t pte_phys = pgtable_alloc(PAGE_SHIFT);
>> +			pmd_clear(pmdp);
>> +			pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN;
>> +			if (flags & NO_EXEC_MAPPINGS)
>> +				pmdval |= PMD_TABLE_PXN;
>> +			__pmd_populate(pmdp, pte_phys, pmdval);
>> +			flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> 
> This doesn't follow the architecture requirements for "break before
> make" when changing live page tables. While it may work, it risks
> triggering a TLB conflict abort. The correct sequence normally is:
> 
> 	pmd_clear();
> 	flush_tlb_kernel_range();
> 	__pmd_populate();
> 
> However, do we have any guarantees that the kernel doesn't access the
> pmd range being unmapped temporarily? The page table itself might live
> in one of these sections, so set_pmd() etc. can get a translation fault.
> 
Thanks.

The cpu can generate a TLB conflict abort if it detects that the address 
being looked up in the TLB hits multiple entries.

(1).I think when gathering small page to block/section mapping, there 
maybe tlb conflict if no complying with BBM.

Namely:
a.Map a 4KB page (address X)
   Touch that page, in order to get the translation cached in the TLB

b.Modify the translation tables
   replacing the mapping for address X with a 2MB mapping - DO NOT 
INVALIDATE the TLB

c.Touch "X + 4KB"
   This will/should miss in the TLB, causing a new walk returning the 
2MB mapping

d.Touch X
   Assuming they've not been evicted, you'll hit both on the 4KB and 2MB 
mapping - as both cover address X.

There is tlb conflict.
(link: 
https://community.arm.com/support-forums/f/dev-platforms-forum/13583/tlb-conflict-abort)



(2).But when spliting large block/section mapping to small granularity, 
there maybe no tlb conflict.

Namely:
a.rebuild the pte level mapping without any change to orgin pagetable
   (the relation between virtual address and physicall address keep same)

b.modify 1G mappting to use the new pte level mapping in the [[[mem]]] 
without tlb flush

c.When the cpu access the 1G mem(anywhere),
   If 1G tlb entry already cached in tlb, all the 1G mem will access 
success(without any tlb loaded, no confilict)

   If 1G tlb entry has been evicted, then the tlb will access pagetable 
in mem(despite the cpu "catch" the old(1G) or new(4k) mapped pagetale in 
the mem, all the 1G mem can access sucess)(load new tlb entry, no conflict)

d.Afterward, we flush the tlb and force cpu use the new pagetable.(no 
conflict)

It seems that there are no two tlb entries for a same virtual address in 
the tlb cache When spliting large block/section mapping.



(3).At the same time, I think we can use another way.
As the system linear maping is builded with init_pg_dir, we can also 
resue the init_pg_dir to split the block/setion mapping sometime.
As init_pg_dir contain all kernel text/data access and we can comply 
with the BBM requirement.

a.rebuild new pte level mapping without any change to the old 
mapping(the cpu can't walk access the new page mapping, it's isolated)

b.change to use init_pg_dir

c.clear the old 1G block mapping and flush tlb

d.modify the linear mapping to use new pte level page mapping with 
init_pg_dir(TLB BBM)

e.switch to swapper_pg_dir


Could you give me some advice?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ