[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5B2CD33B.9020702@hisilicon.com>
Date: Fri, 22 Jun 2018 18:45:15 +0800
From: Wei Xu <xuwei5@...ilicon.com>
To: Will Deacon <will.deacon@....com>
CC: James Morse <james.morse@....com>, <catalin.marinas@....com>,
<suzuki.poulose@....com>, <dave.martin@....com>,
<mark.rutland@....com>, <marc.zyngier@....com>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>, Linuxarm <linuxarm@...wei.com>,
Hanjun Guo <guohanjun@...wei.com>, <xiexiuqi@...wei.com>,
huangdaode <huangdaode@...ilicon.com>,
"Chenxin (Charles)" <charles.chenxin@...wei.com>,
"Xiongfanggou (James)" <james.xiong@...wei.com>,
"Liguozhu (Kenneth)" <liguozhu@...ilicon.com>,
Zhangyi ac <zhangyi.ac@...wei.com>,
<jonathan.cameron@...wei.com>,
Shameerali Kolothum Thodi
<shameerali.kolothum.thodi@...wei.com>,
John Garry <john.garry@...wei.com>,
Salil Mehta <salil.mehta@...wei.com>,
Shiju Jose <shiju.jose@...wei.com>,
"Zhuangyuzeng (Yisen)" <yisen.zhuang@...wei.com>,
"Wangzhou (B)" <wangzhou1@...ilicon.com>,
"kongxinwei (A)" <kong.kongxinwei@...ilicon.com>,
"Liyuan (Larry, Turing Solution)" <Larry.T@...wei.com>,
<libeijian@...ilicon.com>, <zhangbin011@...ilicon.com>
Subject: Re: KVM guest sometimes failed to boot because of kernel stack
overflow if KPTI is enabled on a hisilicon ARM64 platform.
Hi Will,
On 2018/6/22 17:23, Will Deacon wrote:
> Hi Wei,
>
> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>> On 2018/6/21 11:54, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>> otherwise your kernel will take an age to boot.
>>>> Yes, amazing! This patch resolved the issue.
>>> Great...
>>>
>>>> I have tested 50 times and can not reproduce the issue any more.
>>>> Could you please tell more why this patch works?
>>> You might need to ask your CPU design team ;)
>>>
>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>> bit 11 in table descriptors so that we can keep track of which parts of
>>> the page table we've visited. With this patch, we don't bother tracking
>>> and potentially rewalk parts of the page table (which takes a very long
>>> time if KASAN is enabled).
>> Got it. Thanks!
>>
>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>> by the CPU, which:
>>>
>>> "Indicates that the architecture guarantees that the bit or field is not
>>> interpreted or modified by hardware."
>>>
>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>> non-leaf (table) descriptors?
>> Do the non-leaf(table) descriptors mean the table descriptors
>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>
>> If yes, our hardware does ignore it(not interpret or modify).
> Ok, thanks for checking.
>
>> Is there any other possible reason cause this?
> Perhaps just writing back the table entries is enough to cause the issue,
> although I really can't understand why that would be the case. Can you try
> the diff below (without my previous change), please?
Thanks!
But it does not resolve the issue(only apply this patch based on 4.17.0).
The log is as below:
estuary:/$ ./qemu-system-aarch64 -machine
virt,kernel_irqchip=on,gic-version=3
-cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx
-initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-gc58dc48
(joyx@...ing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #14
SMP PREEMPT Fri Jun 22 18:26:01 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@...000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@...000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000844] Console: colour dummy device 80x25
[ 0.001406] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002458] pid_max: default: 32768 minimum: 301
[ 0.002944] Security Framework initialized
[ 0.003521] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004322] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005022] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005797] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025904] ASID allocator initialised with 32768 entries
[ 0.029913] Hierarchical SRCU implementation.
[ 0.034285] Platform MSI: its domain created
[ 0.034740] PCI/MSI: /intc/its domain created
[ 0.035318] EFI services will not be available.
[ 0.037943] smp: Bringing up secondary CPUs ...
[ 0.038410] smp: Brought up 1 node, 1 CPU
[ 0.038815] SMP: Total of 1 processors activated.
[ 0.039300] CPU features: detected: GIC system register CPU
interface
[ 0.039946] CPU features: detected: Privileged Access Never
[ 0.040506] CPU features: detected: User Access Override
[ 0.042439] Insufficient stack space to handle exception!
[ 0.042441] ESR: 0x96000046 -- DABT (current EL)
[ 0.043752] FAR: 0xffff0000093a80e0
[ 0.044207] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046511] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.052899] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059396] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.067018] Hardware name: linux,dummy-virt (DT)
[ 0.071710] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.076532] pc : el1_sync+0x0/0xb0
[ 0.080028] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.085197] sp : ffff0000093a80e0
[ 0.088566] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.093979] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.099293] x25: ffff00000906d000 x24: ffff000009191000
[ 0.104706] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.110015] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.115428] x19: ffff000009190000 x18: 000000003455d99d
[ 0.120842] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.126255] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.131566] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.136983] x11: 000000007eff8000 x10: 0000000000000000
[ 0.142396] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.147704] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.153116] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.158530] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.163943] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.169251] Kernel panic - not syncing: kernel stack overflow
[ 0.175140] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.182732] Hardware name: linux,dummy-virt (DT)
[ 0.187424] Call trace:
[ 0.189948] dump_backtrace+0x0/0x180
[ 0.193678] show_stack+0x14/0x1c
[ 0.197051] dump_stack+0x90/0xb0
[ 0.200423] panic+0x138/0x2a0
[ 0.203549] __stack_chk_fail+0x0/0x18
[ 0.207398] handle_bad_stack+0x118/0x124
[ 0.211489] __bad_stack+0x88/0x8c
[ 0.214870] el1_sync+0x0/0xb0
[ 0.217998] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.226061] Mem abort info:
[ 0.228839] ESR = 0x96000006
[ 0.231965] Exception class = DABT (current EL), IL = 32 bits
[ 0.237980] SET = 0, FnV = 0
[ 0.241105] EA = 0, S1PTW = 0
[ 0.244346] Data abort info:
[ 0.247239] ISV = 0, ISS = 0x00000006
[ 0.251199] CM = 0, WnR = 0
[ 0.254209] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.261191] [ffff0000093abce0] pgd=00000000411f8003,
pud=00000000411f9003, pmd=0000000000000000
[ 0.269982] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.275538] Modules linked in:
[ 0.278664] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.286361] Hardware name: linux,dummy-virt (DT)
[ 0.291053] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.295874] pc : unwind_frame+0x28/0xc8
[ 0.299836] lr : dump_backtrace+0x12c/0x180
[ 0.304055] sp : ffff80003efcf000
[ 0.307429] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.312841] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.318255] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.323563] x23: 0000000000000000 x22: ffff000008dbada0
[ 0.328975] x21: 0000000000000000 x20: ffff000009049000
[ 0.334388] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.339698] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.345111] x15: 000000007eff6000 x14: 3431232038346364
[ 0.350523] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.355832] x11: ffffffffffffffff x10: 0000000000000075
[ 0.361245] x9 : ffff0000085ae9e8 x8 : 78302f3078302b63
[ 0.366666] x7 : 6e79735f316c6520 x6 : ffff0000091befe1
[ 0.371976] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.377389] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.382801] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.388214] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.395204] Call trace:
[ 0.397726] unwind_frame+0x28/0xc8
[ 0.401224] show_stack+0x14/0x1c
[ 0.404699] dump_stack+0x90/0xb0
[ 0.408070] panic+0x138/0x2a0
[ 0.411198] __stack_chk_fail+0x0/0x18
[ 0.414944] handle_bad_stack+0x118/0x124
[ 0.419035] __bad_stack+0x88/0x8c
[ 0.422520] el1_sync+0x0/0xb0
[ 0.425648] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.433601] Mem abort info:
[ 0.436486] ESR = 0x96000006
[ 0.439611] Exception class = DABT (current EL), IL = 32 bits
[ 0.445626] SET = 0, FnV = 0
[ 0.448754] EA = 0, S1PTW = 0
[ 0.451995] Data abort info:
[ 0.454888] ISV = 0, ISS = 0x00000006
[ 0.458849] CM = 0, WnR = 0
[ 0.461860] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.468843] [ffff0000093abce0] pgd=00000000411f8003,
pud=00000000411f9003, pmd=0000000000000000
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..e2a8e88f95a0 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
> .endm
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> - orr \type, \type, #PTE_NG // Same bit for blocks and pages
> + eor \type, \type, #PTE_NG // Same bit for blocks and pages
> str \type, [cur_\()\type\()p] // Update the entry and ensure it
> dc civac, cur_\()\type\()p // is visible to all CPUs.
> .endm
> @@ -298,6 +298,7 @@ skip_pgd:
> /* PUD */
> walk_puds:
> .if CONFIG_PGTABLE_LEVELS > 3
> + eor pgd, pgd, #PTE_NG
> pte_to_phys cur_pudp, pgd
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> @@ -319,6 +320,7 @@ next_pud:
> /* PMD */
> walk_pmds:
> .if CONFIG_PGTABLE_LEVELS > 2
> + eor pud, pud, #PTE_NG
> pte_to_phys cur_pmdp, pud
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> @@ -339,6 +341,7 @@ next_pmd:
>
> /* PTE */
> walk_ptes:
> + eor pmd, pmd, #PTE_NG
> pte_to_phys cur_ptep, pmd
> add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
> do_pte: __idmap_kpti_get_pgtable_ent pte
>
> .
>
Powered by blists - more mailing lists