lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_2B7B5B8DBE69330DA041AEE36C1029826905@qq.com>
Date:   Sun, 12 Jun 2022 22:59:30 +0800
From:   "zhangfei.gao@...mail.com" <zhangfei.gao@...mail.com>
To:     paulmck@...nel.org
Cc:     Zhangfei Gao <zhangfei.gao@...aro.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        rcu@...r.kernel.org, Lai Jiangshan <jiangshanlai@...il.com>,
        Josh Triplett <josh@...htriplett.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Matthew Wilcox <willy@...radead.org>,
        Shameerali Kolothum Thodi 
        <shameerali.kolothum.thodi@...wei.com>
Subject: Re: Commit 282d8998e997 (srcu: Prevent expedited GPs and blocking
 readers from consuming CPU) cause qemu boot slow

Hi, Paul

On 2022/6/12 下午9:36, Paul E. McKenney wrote:
> On Sun, Jun 12, 2022 at 03:40:30PM +0800, zhangfei.gao@...mail.com wrote:
>> Hi, Paul
>>
>> On 2022/6/12 上午12:59, Paul E. McKenney wrote:
>>> On Sun, Jun 12, 2022 at 12:32:59AM +0800, Zhangfei Gao wrote:
>>>> Hi, Paul
>>>>
>>>> When verifying qemu with acpi rmr feature on v5.19-rc1, the guest kernel
>>>> stuck for several minutes.
>>> Stuck for several minutes but then continues normally?  Or stuck for
>>> several minutes before you kill qemu?
>> qemu boot stuck for several minutes, then guest can bootup normally, just
>> slower.
>>> And I have to ask...  What happened without the ACPI RMR feature?
>> If no ACPI, qemu boot quickly without stuck.
>> build/aarch64-softmmu/qemu-system-aarch64 -machine
>> virt,gic-version=3,iommu=smmuv3 \
>> -enable-kvm -cpu host -m 1024 \
>> -kernel Image -initrd mini-rootfs.cpio.gz -nographic -append \
>> "rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off"
>>
>> Adding acpi=force & -bios QEMU_EFI.fd, qemu boot stuck for several minutes.
>>
>> By the way, my hardware platform is aarch64.
> Thank you for the information!  The problem is excessive delay rather
> than a hang, and it is configuration-dependent.  Good to know!
>
>> Only change this can solve the stuck issue.
>>
>> --- a/kernel/rcu/srcutree.c
>> +++ b/kernel/rcu/srcutree.c
>> @@ -524,6 +524,10 @@ static unsigned long srcu_get_delay(struct srcu_struct
>> *ssp)
>>   {
>>          unsigned long jbase = SRCU_INTERVAL;
>>
>> +       if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq),
>> READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
>> +               return 0;
>> +       return SRCU_INTERVAL;
> I am glad that you have a workaround for this problem, but this change
> would re-introduce the problem that commit 282d8998e997 ("srcu: Prevent
> expedited GPs and blocking readers from consuming CPU") was intended
> to fix.  For one example, your change can prevent kernel live patching
> from applying a patch.  So something else is needed.
Understand, it is just debug where has issue.
>
> Does changing the value of SRCU_MAX_INTERVAL to (say) 3 decrease the delay
> significantly?  (This is not a fix, either, but instead a debug check.)
No use.
>
> Your change always returns zero if another SRCU grace period is needed.
> Let's look at the callers of srcu_get_delay():
>
> o	cleanup_srcu_struct() uses it to check whether there is an
> 	expedited grace period pending, leaking the srcu_struct if so.
> 	This should not affect boot delay.  (Unless you are invoking
> 	init_srcu_struct() and cleanup_srcu_struct() really really
> 	often.)
>
> o	srcu_gp_end() uses it to determine whether or not to allow
> 	a one-jiffy delay before invoking callbacks at the end of
> 	a grace period.
>
> o	srcu_funnel_gp_start() uses it to determine whether or not to
> 	allow a one-jiffy delay before starting the process of checking
> 	for the end of an SRCU grace period.
>
> o	try_check_zero() uses it to add an additional short delay
> 	(instead of a long delay) between checks of reader state.
>
> o	process_srcu() uses it to calculate the long delay between
> 	checks of reader state.
>
> These add one-jiffy delays, except for process_srcu(), which adds a delay
> of up to 10 jiffies.  Even given HZ=100 (as opposed to the HZ=1000 that
> I normally use), this requires thousands of such delays to add up to the
> several minutes that you are seeing.  (In theory, the delays could also
> be due to SRCU readers, except that in that case adjusting timeouts in
> the grace-period processing would not make things go faster.)
>
> So, does acpi=force & -bios QEMU_EFI.fd add SRCU grace periods?  If so,
> it would be very good make sure that this code is using SRCU efficiently.
> One way to check would be to put a printk() into synchronize_srcu(),
> though maintaining a counter and printing (say) every 1000th invocation
> might be easier on the console output.
good idea.

>>>> And on 5.18, there is no such problem.
>>>>
>>>> After revert this patch, the issue solved.
>>>> Commit 282d8998e997 (srcu: Prevent expedited GPs and blocking readers from
>>>> consuming CPU)
>>>>
>>>>
>>>> qemu cmd:
>>>> build/aarch64-softmmu/qemu-system-aarch64 -machine
>>>> virt,gic-version=3,iommu=smmuv3 \
>>>> -enable-kvm -cpu host -m 1024 \
>>>> -kernel Image -initrd mini-rootfs.cpio.gz -nographic -append \
>>>> "rdinit=init console=ttyAMA0 earlycon=pl011,0x9000000 kpti=off acpi=force" \
>>>> -bios QEMU_EFI.fd
>>>>
>>>> log:
>>>> InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7AA4D040
>>>> add-symbol-file /home/linaro/work/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC48/AARCH64/NetworkPkg/IScsiDxe/IScsiDxe/DEBUG/IScsiDxe.dll
>>>> 0x75459000
>>>> Loading driver at 0x00075458000 EntryPoint=0x00075459058 IScsiDxe.efi
>>>> InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 7AA4DE98
>>>> ProtectUefiImageCommon - 0x7AA4D040
>>>>     - 0x0000000075458000 - 0x000000000003F000
>>>> SetUefiImageMemoryAttributes - 0x0000000075458000 - 0x0000000000001000
>>>> (0x0000000000004008)
>>>> SetUefiImageMemoryAttributes - 0x0000000075459000 - 0x000000000003B000
>>>> (0x0000000000020008)
>>>> SetUefiImageMemoryAttributes - 0x0000000075494000 - 0x0000000000003000
>>>> (0x0000000000004008)
>>>> InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 754952C8
>>>> InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 75495358
>>>> InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 75495370
>>>> InstallProtocolInterface: 18A031AB-B443-4D1A-A5C0-0C09261E9F71 754952F8
>>>> InstallProtocolInterface: 107A772C-D5E1-11D4-9A46-0090273FC14D 75495358
>>>> InstallProtocolInterface: 6A7A5CFF-E8D9-4F70-BADA-75AB3025CE14 75495370
>>>> InstallProtocolInterface: 59324945-EC44-4C0D-B1CD-9DB139DF070C 75495348
>>>> InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 754953E8
>>>> InstallProtocolInterface: 330D4706-F2A0-4E4F-A369-B66FA8D54385 7AA4D728
>>>>
>>>>
>>>> Not sure it is either reported or solved.
>>> This is the first I have heard of it, so thank you for reporting it.
>>>
>>> Do you have a way of collecting something sysrq-t output?
>> Do you mean "echo t > /proc/sysrq-trigger",
>> There are too much output and kernel dump can not stop.
> OK.  What other tools do you have to work out what is happening during
> temporary hangs such as this one?
>
> The question to be answered: "Is there usually at least one task waiting
> in synchronize_srcu() during these hangs, and if so, which srcu_struct
> is passed to those synchronize_srcu() calls?"

As you suggested, add print in __synchronize_srcu, 1000 times print once.

With acpi=force & -bios QEMU_EFI.fd

When qemu stuck in
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 7AA4D040
add-symbol-file 
/home/linaro/work/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC48/AARCH64/NetworkPkg/IScsiDxe/IScsiDxe/DEBUG/IScsiDxe.dll 
0x75459000

The print in  __synchronize_srcu is print from 0 t0 9001.

[   94.271350] gzf __synchronize_srcu loop=1001
....

[  222.621659]  __synchronize_srcu loop=9001
[  222.621664] CPU: 96 PID: 2294 Comm: qemu-system-aar Not tainted 
5.19.0-rc1-15071-g697f40b5235f-dirty #615
[  222.621666] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, 
BIOS 2280-V2 CS V5.B133.01 03/25/2021
[  222.621667] Call trace:
[  222.621668]  dump_backtrace+0xe4/0xf0
[  222.621670] show_stack+0x20/0x70
[  222.621672] dump_stack_lvl+0x8c/0xb8
[  222.621674]  dump_stack+0x18/0x34
[  222.621676] __synchronize_srcu+0x120/0x128
[  222.621678] synchronize_srcu_expedited+0x2c/0x40
[  222.621680] kvm_swap_active_memslots+0x130/0x198
[  222.621683] kvm_activate_memslot+0x40/0x68
[  222.621684]  kvm_set_memslot+0x184/0x3b0
[  222.621686]  __kvm_set_memory_region+0x288/0x438
[  222.621688] kvm_set_memory_region+0x3c/0x60
[  222.621689]  kvm_vm_ioctl+0x5a0/0x13e0
[  222.621691]  __arm64_sys_ioctl+0xb0/0xf8
[  222.621693]  invoke_syscall+0x4c/0x110
[  222.621696] el0_svc_common.constprop.0+0x68/0x128
[  222.621698]  do_el0_svc+0x34/0xc0
[  222.621701]  el0_svc+0x30/0x98
[  222.621704]  el0t_64_sync_handler+0xb8/0xc0
[  222.621706]  el0t_64_sync+0x18c/0x190


If no acpi=force, no print at all, 1000 times one print.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ