lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a615ecf-c26f-4710-87df-c02f1e489d19@linux.alibaba.com>
Date: Sun, 11 Feb 2024 23:35:42 +0800
From: Bitao Hu <yaoma@...ux.alibaba.com>
To: Petr Mladek <pmladek@...e.com>
Cc: dianders@...omium.org, akpm@...ux-foundation.org, kernelfans@...il.com,
 liusong@...ux.alibaba.com, linux-kernel@...r.kernel.org,
 yaoma@...ux.alibaba.com
Subject: Re: [PATCHv6 1/2] watchdog/softlockup: low-overhead detection of
 interrupt

Hi,

On 2024/2/9 21:35, Petr Mladek wrote:
> Hi,
> 
> I am sorry for jouning this game so late. But honestly, it went
> forward too quickly. A good practice is to wait a week before
> sending new version so that you give a chance more people
> to provide some feedback.
> 
> The only exception might be when you know exactly who could
> review it because the area in not interesting for anyone else.
> But this is typicall not the case for kernel core code.
Thanks for your reminder, I will be mindful of the pace.
> 
> 
> On Thu 2024-02-08 20:54:25, Bitao Hu wrote:
>> The following softlockup is caused by interrupt storm, but it cannot be
>> identified from the call tree. Because the call tree is just a snapshot
>> and doesn't fully capture the behavior of the CPU during the soft lockup.
>>    watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
>>    ...
>>    Call trace:
>>      __do_softirq+0xa0/0x37c
>>      __irq_exit_rcu+0x108/0x140
>>      irq_exit+0x14/0x20
>>      __handle_domain_irq+0x84/0xe0
>>      gic_handle_irq+0x80/0x108
>>      el0_irq_naked+0x50/0x58
>>
>> Therefore,I think it is necessary to report CPU utilization during the
>> softlockup_thresh period (report once every sample_period, for a total
>> of 5 reportings), like this:
>>    watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
>>    CPU#28 Utilization every 4s during lockup:
>>      #1: 0% system, 0% softirq, 100% hardirq, 0% idle
>>      #2: 0% system, 0% softirq, 100% hardirq, 0% idle
>>      #3: 0% system, 0% softirq, 100% hardirq, 0% idle
>>      #4: 0% system, 0% softirq, 100% hardirq, 0% idle
>>      #5: 0% system, 0% softirq, 100% hardirq, 0% idle
> 
> I like this. IMHO, it might be really useful.
> 
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -333,6 +335,92 @@ __setup("watchdog_thresh=", watchdog_thresh_setup);
>>   
>>   static void __lockup_detector_cleanup(void);
>>   
>> +#ifdef CONFIG_SOFTLOCKUP_DETECTOR_INTR_STORM
>> +#define NUM_STATS_GROUPS	5
> 
> It would be nice to synchronize this with the hardcoded 5 in:
> 
> static void set_sample_period(void)
> {
> 	/*
> 	 * convert watchdog_thresh from seconds to ns
> 	 * the divide by 5 is to give hrtimer several chances (two
> 	 * or three with the current relation between the soft
> 	 * and hard thresholds) to increment before the
> 	 * hardlockup detector generates a warning
> 	 */
> 	sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5);
OK, I've had the same thought.
> 
> For exmaple, define and use the following in both situations:
> 
> #define NUM_SAMPLE_PERIODS	5
>
>> +enum stats_per_group {
>> +	STATS_SYSTEM,
>> +	STATS_SOFTIRQ,
>> +	STATS_HARDIRQ,
>> +	STATS_IDLE,
>> +	NUM_STATS_PER_GROUP,
>> +};
>> +
>> +static const enum cpu_usage_stat tracked_stats[NUM_STATS_PER_GROUP] = {
>> +	CPUTIME_SYSTEM,
>> +	CPUTIME_SOFTIRQ,
>> +	CPUTIME_IRQ,
>> +	CPUTIME_IDLE,
>> +};
>> +
>> +static DEFINE_PER_CPU(u16, cpustat_old[NUM_STATS_PER_GROUP]);
>> +static DEFINE_PER_CPU(u8, cpustat_util[NUM_STATS_GROUPS][NUM_STATS_PER_GROUP]);
>> +static DEFINE_PER_CPU(u8, cpustat_tail);
>> +
>> +/*
>> + * We don't need nanosecond resolution. A granularity of 16ms is
>> + * sufficient for our precision, allowing us to use u16 to store
>> + * cpustats, which will roll over roughly every ~1000 seconds.
>> + * 2^24 ~= 16 * 10^6
>> + */
>> +static u16 get_16bit_precision(u64 data_ns)
>> +{
>> +	return data_ns >> 24LL; /* 2^24ns ~= 16.8ms */
> 
> I would personally use
> 
>      delta_ns >> 20  /* 2^20ns ~= 1ms */
> 
> to make it easier for debugging by a human. It would support
> the sample period up to 65s which might be enough.
> 
> But I do not resirt on it. ">> 24" provides less granularity
> but it supports longer sample periods.
I considered using ">>20" as it provides more intuitive granularity,
but I wanted to support longer sample periods. After weighing the
options, I chose ">>24".
> 
>> +static void print_cpustat(void)
>> +{
>> +	int i, group;
>> +	u8 tail = __this_cpu_read(cpustat_tail);
>> +	u64 sample_period_second = sample_period;
>> +
>> +	do_div(sample_period_second, NSEC_PER_SEC);
>> +	/*
>> +	 * We do not want the "watchdog: " prefix on every line,
>> +	 * hence we use "printk" instead of "pr_crit".
>> +	 */
>> +	printk(KERN_CRIT "CPU#%d Utilization every %llus during lockup:\n",
>> +	       smp_processor_id(), sample_period_second);
>> +	for (i = 0; i < NUM_STATS_GROUPS; i++) {
> 
> This starts with the 1st group in the array. Is it the oldest one?
> It should take into account cpustat_tail.
Yes, It starts with the oldest one. After "update_cpustat" is completed,
"cpustat_tail" points to the oldest one. Here, I start accessing the 
data pointed to by the "cpustat_tail".
> 
> 
>> +		group = (tail + i) % NUM_STATS_GROUPS;
>> +		printk(KERN_CRIT "\t#%d: %3u%% system,\t%3u%% softirq,\t"
>> +			"%3u%% hardirq,\t%3u%% idle\n", i + 1,
>> +			__this_cpu_read(cpustat_util[group][STATS_SYSTEM]),
>> +			__this_cpu_read(cpustat_util[group][STATS_SOFTIRQ]),
>> +			__this_cpu_read(cpustat_util[group][STATS_HARDIRQ]),
>> +			__this_cpu_read(cpustat_util[group][STATS_IDLE]));
>> +	}
>> +}
>> +
> 

Best Regards,
Bitao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ