lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Oct 2021 12:48:28 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Seth Forshee <seth@...shee.me>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>,
        "Paul E. McKenney" <paulmck@...nel.org>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: sch: eliminate unnecessary RCU waits in
 mini_qdisc_pair_swap()

On Fri, 22 Oct 2021 11:17:46 -0500 Seth Forshee wrote:
> From: Seth Forshee <sforshee@...italocean.com>
> 
> Currently rcu_barrier() is used to ensure that no readers of the
> inactive mini_Qdisc buffer remain before it is reused. This waits for
> any pending RCU callbacks to complete, when all that is actually
> required is to wait for one RCU grace period to elapse after the buffer
> was made inactive. This means that using rcu_barrier() may result in
> unnecessary waits.
> 
> To improve this, store the current RCU state when a buffer is made
> inactive and use poll_state_synchronize_rcu() to check whether a full
> grace period has elapsed before reusing it. If a full grace period has
> not elapsed, wait for a grace period to elapse, and in the non-RT case
> use synchronize_rcu_expedited() to hasten it.
> 
> Since this approach eliminates the RCU callback it is no longer
> necessary to synchronize_rcu() in the tp_head==NULL case. However, the
> RCU state should still be saved for the previously active buffer.
> 
> Before this change I would typically see mini_qdisc_pair_swap() take
> tens of milliseconds to complete. After this change it typcially
> finishes in less than 1 ms, and often it takes just a few microseconds.
> 
> Thanks to Paul for walking me through the options for improving this.
> 
> Cc: "Paul E. McKenney" <paulmck@...nel.org>
> Signed-off-by: Seth Forshee <sforshee@...italocean.com>

LGTM, but please rebase and retest on top of latest net-next.

>  void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>  			  struct tcf_proto *tp_head)
>  {
> @@ -1423,28 +1419,30 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>  
>  	if (!tp_head) {
>  		RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
> -		/* Wait for flying RCU callback before it is freed. */
> -		rcu_barrier();
> -		return;
> -	}
> +	} else {
> +		miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> +			&miniqp->miniq1 : &miniqp->miniq2;
>  
> -	miniq = !miniq_old || miniq_old == &miniqp->miniq2 ?
> -		&miniqp->miniq1 : &miniqp->miniq2;

nit: any reason this doesn't read:

	miniq = miniq_old != &miniqp->miniq1 ? 
		&miniqp->miniq1 : &miniqp->miniq2;

Surely it's not equal to miniq1 or miniq2 if it's NULL.

> +		/* We need to make sure that readers won't see the miniq
> +		 * we are about to modify. So ensure that at least one RCU
> +		 * grace period has elapsed since the miniq was made
> +		 * inactive.
> +		 */
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +			cond_synchronize_rcu(miniq->rcu_state);
> +		else if (!poll_state_synchronize_rcu(miniq->rcu_state))
> +			synchronize_rcu_expedited();
>  
> -	/* We need to make sure that readers won't see the miniq
> -	 * we are about to modify. So wait until previous call_rcu callback
> -	 * is done.
> -	 */
> -	rcu_barrier();
> -	miniq->filter_list = tp_head;
> -	rcu_assign_pointer(*miniqp->p_miniq, miniq);
> +		miniq->filter_list = tp_head;
> +		rcu_assign_pointer(*miniqp->p_miniq, miniq);
> +	}
>  
>  	if (miniq_old)
> -		/* This is counterpart of the rcu barriers above. We need to
> +		/* This is counterpart of the rcu sync above. We need to
>  		 * block potential new user of miniq_old until all readers
>  		 * are not seeing it.
>  		 */
> -		call_rcu(&miniq_old->rcu, mini_qdisc_rcu_func);
> +		miniq_old->rcu_state = start_poll_synchronize_rcu();
>  }
>  EXPORT_SYMBOL(mini_qdisc_pair_swap);
>  
> @@ -1463,6 +1461,8 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
>  	miniqp->miniq1.cpu_qstats = qdisc->cpu_qstats;
>  	miniqp->miniq2.cpu_bstats = qdisc->cpu_bstats;
>  	miniqp->miniq2.cpu_qstats = qdisc->cpu_qstats;
> +	miniqp->miniq1.rcu_state = get_state_synchronize_rcu();
> +	miniqp->miniq2.rcu_state = miniqp->miniq1.rcu_state;
>  	miniqp->p_miniq = p_miniq;
>  }
>  EXPORT_SYMBOL(mini_qdisc_pair_init);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ