linux-kernel - Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut down on idle)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1601271211460.14987@east.gentwo.org>
Date:	Wed, 27 Jan 2016 12:26:16 -0600 (CST)
From:	Christoph Lameter <cl@...ux.com>
To:	Michal Hocko <mhocko@...nel.org>
cc:	Mike Galbraith <umgwanakikbuti@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, vmstat: make quiet_vmstat lighter (was: Re: fast
 path cycle muncher (vmstat: make vmstat_updater deferrable) again and shut
 down on idle)

On Wed, 27 Jan 2016, Michal Hocko wrote:

> This is caused by 0eb77e988032 ("vmstat: make vmstat_updater deferrable
> again and shut down on idle") which has placed quiet_vmstat into
> cpu_idle_loop. The main reason here seems to be that the idle entry has
> to get over all zones and perform atomic operations for each vmstat
> entry even though there might be no per cpu diffs. This is a pointless
> overhead for _each_ idle entry.
>
> Make sure that quiet_vmstat is as light as possible.
>
>  First of all it doesn't make any sense to do any local sync if the
> current cpu is already set in oncpu_stat_off because vmstat_update puts
> itself there only if there is nothing to do.

But there might be something to do that happened in the meantime because
counters were incremented. It needs to be checked. The shepherd can do
that but it will delay the folding of diffs for awhile.

> +void quiet_vmstat(void)
> +{
> +	if (system_state != SYSTEM_RUNNING)
> +		return;
> +
> +	/*
> +	 * If we are already in hands of the shepherd then there
> +	 * is nothing for us to do here.
> +	 */
> +	if (cpumask_test_and_set_cpu(smp_processor_id(), cpu_stat_off))
> +		return;
> +
> +	if (!need_update(smp_processor_id()))
> +		return;
> +
> +	/*
> +	 * Just refresh counters and do not care about the pending delayed
> +	 * vmstat_update. It doesn't fire that often to matter and canceling
> +	 * it would be too expensive from this path.
> +	 * vmstat_shepherd will take care about that for us.
> +	 */
> +	refresh_cpu_vm_stats(false);
> +}

The problem here is that there will be an additional tick generated on
idle. This is an issue for power because now the processor has to
needlessly wake up again, do tick processing etc  just to effectively do a
cancel_delayed_work().

The cancelling of the work request is required to avoid this additonal
tick.

> @@ -1470,11 +1487,14 @@ static void vmstat_shepherd(struct work_struct *w)
>  	get_online_cpus();
>  	/* Check processors whose vmstat worker threads have been disabled */
>  	for_each_cpu(cpu, cpu_stat_off)
> -		if (need_update(cpu) &&
> -			cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> -
> -			queue_delayed_work_on(cpu, vmstat_wq,
> -				&per_cpu(vmstat_work, cpu), 0);
> +		if (need_update(cpu)) {
> +			if (cpumask_test_and_clear_cpu(cpu, cpu_stat_off))
> +				queue_delayed_work_on(cpu, vmstat_wq,
> +					&per_cpu(vmstat_work, cpu), 0);
> +		} else {
> +			cpumask_set_cpu(cpu, cpu_stat_off);

Umm the flag is already set right? This just causes a bouncing cacheline
since we are accessing a global set of cpus,. Drop this line?

> +			cancel_delayed_work(this_cpu_ptr(&vmstat_work));

Ok so this is to move the cancel into the shepherd.

Aside from the two issues mentioned this looks good.