lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y70z2/0w05jnUiK7@alley>
Date:   Tue, 10 Jan 2023 10:46:03 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     "Leizhen (ThunderTown)" <thunder.leizhen@...wei.com>
Cc:     linux-kernel@...r.kernel.org, Luis Chamberlain <mcgrof@...nel.org>,
        linux-modules@...r.kernel.org,
        Anders Roxell <anders.roxell@...aro.org>
Subject: Re: [PATCH v2] kallsyms: Fix sleeping function called from invalid
 context when CONFIG_KALLSYMS_SELFTEST=y

On Tue 2023-01-10 12:05:20, Leizhen (ThunderTown) wrote:
> 
> 
> On 2023/1/9 21:40, Petr Mladek wrote:
> > On Wed 2022-12-28 09:45:11, Zhen Lei wrote:
> >> [T58] BUG: sleeping function called from invalid context at kernel/kallsyms.c:305
> >> The execution time of function kallsyms_on_each_match_symbol() is very
> >> short, about ten microseconds, the probability of this process being
> >> interrupted is very small. And even if it happens, we just have to try
> >> again.
> >>
> >> The execution time of function kallsyms_on_each_symbol() is very long,
> >> it takes tens of milliseconds, context switches is likely occur during
> >> this period. If the time obtained by task_cputime() is accurate, it is
> >> preferred. Otherwise, use local_clock() directly, and the time taken by
> >> irqs and high-priority tasks is not deducted because they are always
> >> running for a short time.
> >>
> >> --- a/kernel/kallsyms_selftest.c
> >> +++ b/kernel/kallsyms_selftest.c
> >> +	/*
> >> +	 * The test thread has been bound to a fixed CPU in advance. If the
> >> +	 * number of irqs does not change, no new scheduling request will be
> >> +	 * generated. That is, the performance test process is atomic.
> >> +	 */
> >> +	do {
> >> +		nr_irqs = kstat_cpu_irqs_sum(cpu);
> >> +		cond_resched();
> >> +		t0 = local_clock();
> >> +		kallsyms_on_each_match_symbol(match_symbol, stat.name, &stat);
> >> +		t1 = local_clock();
> >> +	} while (nr_irqs != kstat_cpu_irqs_sum(cpu));
> > 
> > Huh, is this guaranteed to ever finish?
> > 
> > What if there is a regression and kallsyms_on_each_match_symbol()
> > never finishes without rescheduling?
> 
> The execution time of kallsyms_on_each_match_symbol() does not exceed 10 us.
> Assume that an interrupt is generated every 100 us(10000 interrupts are generated
> per second, it is very high). In this case, interrupts and high-priority tasks need
> to run for more than 90 us each time to cause the loop cannot exit normally.
> However, the CPU usage is 90+%, which is unreasonable.
> 
> Function kallsyms_on_each_symbol() takes a long time, about 20 milliseconds, and
> I'm already using task_cputime().

IMHO, this is a wrong mind set.

After all, this tests a function that was optimized because it took to
long. The function even contains cond_resched() because it caused
problems. I know that kallsyms_on_each_match_symbol() is
faster because it searches only one module but still.

The cond_resched() called before taking t0 is just a horrible
non-reliable hack.

I have seen many problematic non-reliable tests that relayed
on timing. They just hooped for the best. I am sure that we could
do better.

> > This is yet another unreliable hack.
> > 
> > 
> > Use standard solution:
> > 
> > I did the homework for you and checked how the "time" command
> > measures the time spend in the system. It actually uses more methods.
> > 
> > One is times() syscall. It uses thread_group_cputime_adjusted(), see
> > do_sys_times() in kernel/sys.c
> > 
> > Or it uses wait4() syscall to get struct rusage that provides this
> > information.
> > 
> > Please, stop inventing crazy hacks, and use these standard methods.
> > If the "time" tool is enough for userspace performance tests
> > then it must be enough in this case as well.
> 
> Okay, I'll study in depth, just worried about the different precision
> requirements.

If the problem is a timer precision then the solution would be
to repeat the operation more times or use better clock source.
It is actually a good practice to repeat the action more times
in performance tests because it helps to reduce noise.

Well, thread_group_cputime_adjusted() works with
struct task_cputime. It seems to store the time in nano-seconds,
see include/linux/sched/types.h. I might be wrong but I would expect
that it is accurate and precise enough.

> By the way, we still have to actively embrace new things.
> 
> In fact, I've thought of another way, which is to measure nine times,
> sort in ascending order, and take the middle one. Based on probabilistic
> statistics, the intermediate results are reliable.
> > 
> > Or remove this test:
> > 
> > Seriously, what is the value of this test?
> > Is anyone going to use it at all in the future?
> 
> There's really someone interested in performance data.
> https://lkml.org/lkml/2022/12/15/134

I know. But this code had very bad performance for years
and nobody cared. I am not sure if anyone would care
about the info printed by this selftest.

I am not sure if it is worth the effort. We have already
spent quite some time with attempts to make it usable
and are still not there.

Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ