lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfsohuok.ffs@tglx>
Date: Sat, 18 May 2024 01:42:35 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Costa Shulyupin <costa.shul@...hat.com>, longman@...hat.com,
 pauld@...hat.com, juri.lelli@...hat.com, prarit@...hat.com,
 vschneid@...hat.com, Anna-Maria Behnsen <anna-maria@...utronix.de>,
 Frederic Weisbecker <frederic@...nel.org>, Zefan Li
 <lizefan.x@...edance.com>, Tejun Heo <tj@...nel.org>, Johannes Weiner
 <hannes@...xchg.org>, Ingo Molnar <mingo@...hat.com>, Peter Zijlstra
 <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
 <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
 <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, Petr
 Mladek <pmladek@...e.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Masahiro Yamada <masahiroy@...nel.org>, Randy Dunlap
 <rdunlap@...radead.org>, Yoann Congal <yoann.congal@...le.fr>, "Gustavo A.
 R. Silva" <gustavoars@...nel.org>, Nhat Pham <nphamcs@...il.com>, Costa
 Shulyupin <costa.shul@...hat.com>, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org
Subject: Re: [PATCH v1 3/7] sched/isolation: Adjust affinity of hrtimers
 according to change of housekeeping cpumask

On Thu, May 16 2024 at 22:04, Costa Shulyupin wrote:
> Adjust affinity of watchdog_cpumask, hrtimers according to
> change of housekeeping.cpumasks[HK_TYPE_TIMER].
>
> Function migrate_hrtimer_list_except() is prototyped from
> migrate_hrtimer_list() and is more generic.
>
> Potentially it can be used instead of migrate_hrtimer_list.
>
> Function hrtimers_resettle_from_cpu() is blindly prototyped
> from hrtimers_cpu_dying(). local_irq_disable() is used because
> cpuhp_thread_fun() uses it before cpuhp_invoke_callback().

I'm again impressed by the depth of analysis.

Q: What the heck has cpuhp_thread_fun() to do with hrtimers_cpu_dying()?

A: Absolutely nothing.

> Core test snippets without infrastructure:
>
> 1. Create hrtimer on specific cpu with:
>
>         set_cpus_allowed_ptr(current, cpumask_of(test_cpu));
>         hrtimer_init(&test_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
>         test_hrtimer.function = test_hrtimer_cb;
>         hrtimer_start(&test_hrtimer, -1,  HRTIMER_MODE_REL);
>
> 2. Call housekeeping_update()
>
> 3. Assure that there is only tick_nohz_handler on specified cpu
> in /proc/timer_list manually or with script:
>
> grep -E 'cpu| #[0-9]' /proc/timer_list | \
> 	awk "/cpu:/{y=0};/cpu: $test_cpu\$/{y=1};y"
>
> Another alternative solution to migrate hrtimers:
> 1. Use cpuhp to set sched_timer offline
> 2. Resettle all hrtimers likewise migrate_hrtimer_list
> 3. Use cpuhp to set sched_timer online

Another pile of incomprehensible word salad which pretends to be a
change log.

> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -126,10 +126,12 @@ static void resettle_all_timers(cpumask_var_t enable_mask, cpumask_var_t disable
>  
>  	for_each_cpu(cpu, enable_mask)	{
>  		timers_prepare_cpu(cpu);
> +		hrtimers_prepare_cpu(cpu);

And yet another lockless re-initialization of an active in use data
structure ...

> +/*
> + * migrate_hrtimer_list_except - migrates hrtimers from one base to another,
> + * except specified one.
> + */
> +static void migrate_hrtimer_list_except(struct hrtimer_clock_base *old_base,
> +				struct hrtimer_clock_base *new_base, struct hrtimer *except)
> +{
> +	struct hrtimer *timer;
> +	struct timerqueue_node *node;

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations

> +	node = timerqueue_getnext(&old_base->active);
> +	while (node) {
> +		timer = container_of(node, struct hrtimer, node);
> +		node = timerqueue_iterate_next(node);
> +		if (timer == except)
> +			continue;

What's the rationale that there is only a single timer to except from
the migration?

> +		BUG_ON(hrtimer_callback_running(timer));

Q: What prevents that a hrtimer callback is running on the CPU which was
   not isolated before?

A: Nothing. Ergo this is prone to kill a perfectly correct system just
   because of blindly copying something.

   At least your attempt of a changelog is clear about that...

> +		debug_deactivate(timer);
> +
> +		/*
> +		 * Mark it as ENQUEUED not INACTIVE otherwise the
> +		 * timer could be seen as !active and just vanish away
> +		 * under us on another CPU
> +		 */
> +		__remove_hrtimer(timer, old_base, HRTIMER_STATE_ENQUEUED, 0);
> +		timer->base = new_base;
> +		/*
> +		 * Enqueue the timers on the new cpu. This does not
> +		 * reprogram the event device in case the timer
> +		 * expires before the earliest on this CPU, but we run
> +		 * hrtimer_interrupt after we migrated everything to
> +		 * sort out already expired timers and reprogram the
> +		 * event device.
> +		 */
> +		enqueue_hrtimer(timer, new_base, HRTIMER_MODE_ABS);
> +	}
> +}
> +
> +/**
> + * hrtimers_resettle_from_cpu - resettles hrtimers from
> + * specified cpu to housekeeping cpus.
> + */
> +void hrtimers_resettle_from_cpu(unsigned int isol_cpu)
> +{
> +	int ncpu, i;
> +	struct tick_sched *ts = tick_get_tick_sched(isol_cpu);
> +	struct hrtimer_cpu_base *old_base, *new_base;
> +
> +	local_irq_disable();
> +	ncpu = cpumask_any_and(cpu_active_mask, housekeeping_cpumask(HK_TYPE_TIMER));
> +
> +	old_base = &per_cpu(hrtimer_bases, isol_cpu);
> +	new_base = &per_cpu(hrtimer_bases, ncpu);
> +
> +	/*
> +	 * The caller is globally serialized and nobody else
> +	 * takes two locks at once, deadlock is not possible.
> +	 */
> +	raw_spin_lock(&old_base->lock);
> +	raw_spin_lock_nested(&new_base->lock, SINGLE_DEPTH_NESTING);
> +	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
> +		migrate_hrtimer_list_except(&old_base->clock_base[i],
> +				     &new_base->clock_base[i],
> +				     &ts->sched_timer);
> +	}
> +
> +	/*
> +	 * The migration might have changed the first expiring softirq
> +	 * timer on this CPU. Update it.
> +	 */
> +	__hrtimer_get_next_event(new_base, HRTIMER_ACTIVE_SOFT);
> +
> +	raw_spin_unlock(&new_base->lock);
> +	raw_spin_unlock(&old_base->lock);
> +	local_irq_enable();
> +
> +	/* Tell the other CPU to retrigger the next event */
> +	smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
> +}

We clearly need another copy of hrtimers_cpu_dying() and
migrate_hrtimer_list() to add a local_irq_dis/enable() pair and a
completely ill defined exclusion mechanism which assumes that the tick
hrtimer is the only one which has to be excluded from migration.

Even if the exlusion mechanism would be correct there is ZERO
justification for this copy and pasta orgy even if you marked this
series RFC. RFC != POC hackery.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ