lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 9 Sep 2014 13:14:50 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>
cc:	Hugh Dickins <hughd@...gle.com>,
	Chintan Pandya <cpandya@...eaurora.org>,
	akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-arm-msm@...r.kernel.org, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	John Stultz <john.stultz@...aro.org>,
	Ingo Molnar <mingo@...hat.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH v4 2/2] ksm: provide support to use deferrable timers
 for scanner thread

On Mon, 8 Sep 2014, Peter Zijlstra wrote:
> On Mon, Sep 08, 2014 at 01:25:36AM -0700, Hugh Dickins wrote:
> > 
> > --- 3.17-rc4/include/linux/ksm.h	2014-03-30 20:40:15.000000000 -0700
> > +++ linux/include/linux/ksm.h	2014-09-07 11:54:41.528003316 -0700
> 
> > @@ -87,6 +96,11 @@ static inline void ksm_exit(struct mm_st
> >  {
> >  }
> >  
> > +static inline wait_queue_head_t *ksm_switch(struct mm_struct *mm)
> 
> s/ksm_switch/__&/

I don't think so (and I did try building with CONFIG_KSM off too).

> 
> > +{
> > +	return NULL;
> > +}
> > +
> >  static inline int PageKsm(struct page *page)
> >  {
> >  	return 0;
> 
> > --- 3.17-rc4/kernel/sched/core.c	2014-08-16 16:00:54.062189063 -0700
> > +++ linux/kernel/sched/core.c	2014-09-07 11:54:41.528003316 -0700
> 
> > @@ -2304,6 +2305,7 @@ context_switch(struct rq *rq, struct tas
> >  	       struct task_struct *next)
> >  {
> >  	struct mm_struct *mm, *oldmm;
> > +	wait_queue_head_t *wake_ksm = NULL;
> >  
> >  	prepare_task_switch(rq, prev, next);
> >  
> > @@ -2320,8 +2322,10 @@ context_switch(struct rq *rq, struct tas
> >  		next->active_mm = oldmm;
> >  		atomic_inc(&oldmm->mm_count);
> >  		enter_lazy_tlb(oldmm, next);
> > -	} else
> > +	} else {
> >  		switch_mm(oldmm, mm, next);
> > +		wake_ksm = ksm_switch(mm);
> 
> Is this the right mm?

It's next->mm, that's the one I intended (though the patch might
be equally workable using prev->mm instead: given free rein, I'd
have opted for hooking into both prev and next, but free rein is
definitely not what should be granted around here!).

> We've just switched the stack,

I thought that came in switch_to() a few lines further down,
but don't think it matters for this.

> so we're looing at next->mm when we switched away from current.
> That might not exist anymore.

I fail to see how that can be.  Looking at the x86 switch_mm(),
I can see it referencing (unsurprisingly!) both old and new mms
at this point, and no reference to an mm is dropped before the
ksm_switch().  oldmm (there called mm) is mmdropped later in
finish_task_switch().

> 
> > +	}
> >  
> >  	if (!prev->mm) {
> >  		prev->active_mm = NULL;
> > @@ -2348,6 +2352,9 @@ context_switch(struct rq *rq, struct tas
> >  	 * frame will be invalid.
> >  	 */
> >  	finish_task_switch(this_rq(), prev);
> > +
> > +	if (wake_ksm)
> > +		wake_up_interruptible(wake_ksm);
> >  }
> 
> Quite horrible for sure. I really hate seeing KSM cruft all the way down

Yes, I expected that, and I would certainly feel the same way.

And even worse, imagine if this were successful, we might come along
and ask to do something similar for khugepaged.  Though if it comes to
that, I'm sure we would generalize into one hook which does not say
"ksm" or "khugepaged" on it, but would still a present a single unlikely
flag to be tested at this level.  Maybe you would even prefer the
generalized version, but I don't want to complicate the prototype yet.

If it weren't for the "we already have the mm cachelines here" argument,
I by now feel fairly sure that I would be going for hooking into timer
tick instead (where Thomas could then express his horror!).

Do you think I should just forget about cacheline micro-optimizations
and go in that direction instead?

> here. Can't we create a new (timer) infrastructure that does the right
> thing? Surely this isn't the only such case.

A sleep-walking timer, that goes to sleep in one bed, but may wake in
another; and defers while beds are empty?  I'd be happy to try using
that for KSM if it already existed, and no doubt Chintan would too

But I don't think KSM presents a very good case for developing it.
I think KSM's use of a sleep_millisecs timer is really just an apology
for the amount of often wasted work that it does, and dates from before
we niced it down 5.  I prefer the idea of a KSM which waits on activity
amongst the restricted set of tasks it is tracking: as this patch tries.

But my preference may be naive: doing lots of unnecessary work doesn't
matter as much as waking cpus from deep sleep.

> 
> I know both RCU and some NOHZ_FULL muck already track when the system is
> completely idle. This is yet another case of that.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ