[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3cba148a-7077-7b6b-f131-dc65045aa348@arm.com>
Date: Fri, 5 Nov 2021 16:26:32 +0000
From: Lukasz Luba <lukasz.luba@....com>
To: Steev Klimaszewski <steev@...i.org>
Cc: linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org,
linux-arm-msm@...r.kernel.org, sudeep.holla@....com,
will@...nel.org, catalin.marinas@....com, linux@...linux.org.uk,
gregkh@...uxfoundation.org, rafael@...nel.org,
viresh.kumar@...aro.org, amitk@...nel.org,
daniel.lezcano@...aro.org, amit.kachhap@...il.com,
thara.gopinath@...aro.org, bjorn.andersson@...aro.org,
agross@...nel.org
Subject: Re: [PATCH v3 0/5] Refactor thermal pressure update to avoid code
duplication
Hi Steev,
On 11/5/21 3:39 PM, Steev Klimaszewski wrote:
> Hi Lukasz,
>
[snip]
> I've been testing this patchset on the Lenovo Yoga C630, and today while
> compiling alacritty and running an apt-get full-upgrade, I found the
> following in dmesg output:
Thank you for testing and sending feedback!
Are you using a mainline kernel or you applied on some vendor production
kernel this patch set? I need to exclude a different code base
from the equation, especially to the arch_topology.c init code.
>
> [ 194.343903] ------------[ cut here ]------------
> [ 194.343912] WARNING: CPU: 4 PID: 192 at
> drivers/base/arch_topology.c:188
> topology_update_thermal_pressure+0xe4/0x100
> [ 194.343928] Modules linked in: aes_ce_ccm snd_seq_dummy snd_hrtimer
> snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep
> cpufreq_ondemand cpufreq_conservative cpufreq_powersave
> cpufreq_userspace lz4 lz4_compress zram zsmalloc q6asm_dai q6routing
> q6afe_dai q6adm q6asm q6afe q6dsp_common snd_soc_wsa881x q6core
> regmap_sdw snd_soc_wcd934x gpio_wcd934x soundwire_qcom snd_soc_wcd_mbhc
> wcd934x regmap_slimbus uvcvideo videobuf2_vmalloc venus_enc venus_dec
> videobuf2_dma_contig videobuf2_memops qrtr_smd fastrpc apr binfmt_misc
> nls_ascii nls_cp437 vfat fat snd_soc_sdm845 snd_soc_rt5663
> snd_soc_qcom_common pm8941_pwrkey joydev snd_soc_rl6231 aes_ce_blk
> qcom_spmi_adc5 soundwire_bus qcom_vadc_common crypto_simd
> qcom_spmi_temp_alarm snd_soc_core cryptd hci_uart snd_compress btqca
> industrialio aes_ce_cipher snd_pcm_dmaengine btrtl crct10dif_ce btbcm
> ghash_ce snd_pcm btintel gf128mul sha2_ce snd_timer venus_core bluetooth
> snd v4l2_mem2mem sha256_arm64 videobuf2_v4l2 videobuf2_common soundcore
> [ 194.344007] sha1_ce videodev ecdh_generic ecc mc ath10k_snoc
> ath10k_core hid_multitouch ath mac80211 qcom_rng libarc4 qcom_q6v5_mss
> cfg80211 sg rfkill qcom_q6v5_pas qcom_pil_info slim_qcom_ngd_ctrl
> qcom_wdt pdr_interface qcom_q6v5 evdev rmtfs_mem slimbus qcom_sysmon
> fuse configfs qrtr ip_tables x_tables autofs4 ext4 mbcache jbd2
> panel_simple rtc_pm8xxx msm llcc_qcom ocmem gpu_sched ti_sn65dsi86
> drm_dp_aux_bus drm_kms_helper drm camcc_sdm845 ipa qcom_common
> qmi_helpers mdt_loader gpio_keys pwm_bl
> [ 194.344056] CPU: 4 PID: 192 Comm: kworker/4:1H Not tainted 5.15.0 #9
> [ 194.344060] Hardware name: LENOVO 81JL/LNVNB161216, BIOS
> 9UCN33WW(V2.06) 06/ 4/2019
> [ 194.344062] Workqueue: events_highpri qcom_lmh_dcvs_poll
> [ 194.344068] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS
> BTYPE=--)
> [ 194.344070] pc : topology_update_thermal_pressure+0xe4/0x100
> [ 194.344073] lr : topology_update_thermal_pressure+0x30/0x100
> [ 194.344075] sp : ffff800014043d10
> [ 194.344076] x29: ffff800014043d10 x28: 0000000000000000 x27:
> 0000000000000000
> [ 194.344080] x26: ffff759c40b66974 x25: ffff759db37a2405 x24:
> ffff759c40e83358
> [ 194.344084] x23: 0000000000000000 x22: ffffb2699eafe1d8 x21:
> 00000000002d1e00
> [ 194.344087] x20: ffff759c49f5bc20 x19: 000000000000b8cc x18:
> 0000000000000000
> [ 194.344090] x17: 2f756e672d78756e x16: ffffb2699d7d83d0 x15:
> 0000000000000000
> [ 194.344093] x14: 0000000000000000 x13: 0000000000000030 x12:
> 0101010101010101
> [ 194.344096] x11: 7f7f7f7f7f7f7f7f x10: feff68716f676668 x9 :
> ffffb2699dd66a58
> [ 194.344099] x8 : fefefefefefefeff x7 : 000000000000000f x6 :
> 0000000000000002
> [ 194.344102] x5 : ffffc33415124000 x4 : 0000000000000400 x3 :
> 0000000000000b19
> [ 194.344105] x2 : 0000000000000b8c x1 : ffffb2699e678f40 x0 :
> ffffb2699e678f48
> [ 194.344108] Call trace:
> [ 194.344110] topology_update_thermal_pressure+0xe4/0x100
> [ 194.344113] qcom_lmh_dcvs_notify+0xc8/0x160
> [ 194.344115] qcom_lmh_dcvs_poll+0x20/0x2c
> [ 194.344116] process_one_work+0x1f4/0x490
> [ 194.344120] worker_thread+0x188/0x504
> [ 194.344121] kthread+0x12c/0x140
> [ 194.344125] ret_from_fork+0x10/0x20
> [ 194.344128] ---[ end trace bd0039c4fb892d5b ]---
[snip]
That's interesting why we hit this. I should have added info about
those two values, which are compared.
Could you make this change and try it again, please?
We would know the problematic values, which triggered this.
---------------------8<-----------------------------------
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index db18d79065fe..0d8db0927041 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -185,8 +185,11 @@ void topology_update_thermal_pressure(const struct
cpumask *cpus,
/* Convert to MHz scale which is used in 'freq_factor' */
capped_freq /= 1000;
- if (WARN_ON(max_freq < capped_freq))
+ if (max_freq < capped_freq) {
+ pr_warn("THERMAL_PRESSURE: max_freq (%lu) < capped_freq
(%lu) for CPUs [%*pbl]\n",
+ max_freq, capped_freq, cpumask_pr_args(cpus));
return;
+ }
capacity = mult_frac(capped_freq, max_capacity, max_freq);
------------------------------>8---------------------------
Could you also dump for me the cpufreq and capacity sysfs content?
$ grep . /sys/devices/system/cpu/cpu*/cpufreq/*
$ grep . /sys/devices/system/cpu/cpu*/cpu_capacity
Regards,
Lukasz
Powered by blists - more mailing lists