linux-kernel - Re: [PATCH v2] sched/eevdf: Prevent vlag from going out of bounds when reweight

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZicOSiEWHJJcahi/@yujie-X299>
Date: Tue, 23 Apr 2024 09:26:34 +0800
From: Yujie Liu <yujie.liu@...el.com>
To: Chen Yu <yu.c.chen@...el.com>
CC: Xuewen Yan <xuewen.yan@...soc.com>, <peterz@...radead.org>,
	<mingo@...hat.com>, <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
	<dietmar.eggemann@....com>, <rostedt@...dmis.org>, <bsegall@...gle.com>,
	<mgorman@...e.de>, <bristot@...hat.com>, <vschneid@...hat.com>,
	<ke.wang@...soc.com>, <xuewen.yan94@...il.com>, <di.shen@...soc.com>,
	<linux-kernel@...r.kernel.org>, Sergei Trofimovich <slyich@...il.com>, "Breno
 Leitao" <leitao@...ian.org>, Igor Raits <igor@...ddata.com>, Tim Chen
	<tim.c.chen@...el.com>
Subject: Re: [PATCH v2] sched/eevdf: Prevent vlag from going out of bounds
 when reweight_eevdf

On Mon, Apr 22, 2024 at 04:47:31PM +0800, Chen Yu wrote:
> On 2024-04-22 at 16:22:38 +0800, Xuewen Yan wrote:
> > kernel encounters the following error when running workload:
> > 
> > BUG: kernel NULL pointer dereference, address: 0000002c
> > EIP: set_next_entity (fair.c:?)
> > 
> > which was caused by NULL pointer returned by pick_eevdf().
> > 
> > Further investigation has shown that, the entity_eligible() has an
> > false-negative issue when the entity's vruntime is far behind the
> > cfs_rq.min_vruntime that, the (vruntime - cfs_rq->min_vruntime) * load
> > caused a s64 overflow, thus every entity on the rb-tree is not
> > eligible, which results in a NULL candidate.
> > 
> > The reason why entity's vruntime is far behind the cfs_rq.min_vruntime
> > is because during a on_rq task group's update_cfs_group()->reweight_eevdf(),
> > there is no limit on the new entity's vlag. If the new weight is much
> > smaller than the old one,
> > 
> > vlag = div_s64(vlag * old_weight, weight)
> > 
> > generates a huge vlag, and results in very small(negative) vruntime.
> > 
> > Thus limit the range of vlag accordingly.
> >
> 
> Thanks for the fix.
> 
> Might also add comments from Tim suggested here:
> https://lore.kernel.org/lkml/ec479251e6245148b89b226f734192f6d5343bbc.camel@linux.intel.com/
> 
> A fix tag might be needed.
> Fixes: eab03c23c2a1 ("sched/eevdf: Fix vruntime adjustment on reweight")
>  
> > Reported-by: Sergei Trofimovich <slyich@...il.com>
> > Closes: https://lore.kernel.org/all/ZhuYyrh3mweP_Kd8@nz.home/
> > Reported-by: Igor Raits <igor@...ddata.com>
> > Closes: https://lore.kernel.org/all/CA+9S74ih+45M_2TPUY_mPPVDhNvyYfy1J1ftSix+KjiTVxg8nw@mail.gmail.com/
> > Reported-by: Breno Leitao <leitao@...ian.org>
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes: https://lore.kernel.org/lkml/202401301012.2ed95df0-oliver.sang@intel.com/
> > Reported-by: Yujie Liu <yujie.liu@...el.com>
> > Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
> > ---
> 
> Cced Sergei, Igor, Breno who have encountered this NULL pointer issue before.
> 
> From my testing, with this applied I did not see the NULL pointer exception
> after running trinity for 100 cycles, so
> 
> Reviewed-and-tested-by: Chen Yu <yu.c.chen@...el.com>
> 
> thanks,
> Chenyu
> 

>From 0-Day testing, with this patch applied on v6.9-rc5, we no longer
see the NULL pointer issue in 999 cycles of trinity test.

Tested-by: Yujie Liu <yujie.liu@...el.com>

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/group/nr_groups:
  vm-snb/trinity/debian-11.1-i386-20220923.cgz/i386-randconfig-004-20240122/clang-17/300s/group-03/5

commit:
  v6.9-rc5
  v6.9-rc5+patch ("sched/eevdf: Prevent vlag from going out of bounds when reweight_eevdf")

        v6.9-rc5              v6.9-rc5+patch
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
         41:999         -4%            :999   dmesg.BUG:kernel_NULL_pointer_dereference,address
         24:999         -2%            :999   dmesg.EIP:pick_next_task_fair
         17:999         -2%            :999   dmesg.EIP:set_next_entity
         41:999         -4%            :999   dmesg.Kernel_panic-not_syncing:Fatal_exception
         41:999         -4%            :999   dmesg.Oops:#[##]

> > changes of v2:
> > -add reported-by (suggested by <yu.c.chen@...el.com>)
> > -remork the changelog (<yu.c.chen@...el.com>)
> > -remove the judgement of fork (Peter)
> > -remove the !on_rq case. (Peter)
> > ---
> > Previous discussion link:
> > https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
> > https://lore.kernel.org/all/20240130080643.1828-1-xuewen.yan@unisoc.com/
> > ---
> > ---
> >  kernel/sched/fair.c | 18 ++++++++++++------
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 03be0d1330a6..64826f406d6d 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -696,15 +696,21 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
> >   *
> >   * XXX could add max_slice to the augmented data to track this.
> >   */
> > -static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +static s64 entity_lag(u64 avruntime, struct sched_entity *se)
> >  {
> > -	s64 lag, limit;
> > +	s64 vlag, limit;
> > +
> > +	vlag = avruntime - se->vruntime;
> > +	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> > +
> > +	return clamp(vlag, -limit, limit);
> > +}
> >  
> > +static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> >  	SCHED_WARN_ON(!se->on_rq);
> > -	lag = avg_vruntime(cfs_rq) - se->vruntime;
> >  
> > -	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> > -	se->vlag = clamp(lag, -limit, limit);
> > +	se->vlag = entity_lag(avg_vruntime(cfs_rq), se);
> >  }
> >  
> >  /*
> > @@ -3761,7 +3767,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
> >  	 *	   = V  - vl'
> >  	 */
> >  	if (avruntime != se->vruntime) {
> > -		vlag = (s64)(avruntime - se->vruntime);
> > +		vlag = entity_lag(avruntime, se);
> >  		vlag = div_s64(vlag * old_weight, weight);
> >  		se->vruntime = avruntime - vlag;
> >  	}
> > -- 
> > 2.25.1
> >