linux-kernel - Re: Question about sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190430105129.GA3923@linux.ibm.com>
Date:   Tue, 30 Apr 2019 03:51:30 -0700
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, andrea.parri@...rulasolutions.com
Subject: Re: Question about sched_setaffinity()

On Tue, Apr 30, 2019 at 12:03:18PM +0200, Peter Zijlstra wrote:
> On Sat, Apr 27, 2019 at 11:02:46AM -0700, Paul E. McKenney wrote:
> 
> > This actually passes rcutorture.  But, as Andrea noted, not klitmus.
> > After some investigation, it turned out that klitmus was creating kthreads
> > with PF_NO_SETAFFINITY, hence the failures.  But that prompted me to
> > put checks into my code: After all, rcutorture can be fooled.
> > 
> > 	void synchronize_rcu(void)
> > 	{
> > 		int cpu;
> > 
> > 		for_each_online_cpu(cpu) {
> > 			sched_setaffinity(current->pid, cpumask_of(cpu));
> > 			WARN_ON_ONCE(raw_smp_processor_id() != cpu);
> > 		}
> > 	}
> > 
> > This triggers fairly quickly, usually in less than a minute of rcutorture
> > testing.
> >
> > And further investigation shows that sched_setaffinity()
> > always returned 0. 
> 
> > Is this expected behavior?  Is there some configuration or setup that I
> > might be missing?
> 
> ISTR there is hotplug involved in RCU torture? In that case, it can be
> sched_setaffinity() succeeds to place us on a CPU, which CPU hotplug
> then takes away. So when we run the WARN thingy, we'll be running on a
> different CPU than expected.

There can be CPU hotplug involved in rcutorture, but it was disabled
during this run.

> If OTOH, your loop is written like (as it really should be):
> 
> 	void synchronize_rcu(void)
> 	{
> 		int cpu;
> 
> 		cpus_read_lock();
> 		for_each_online_cpu(cpu) {
> 			sched_setaffinity(current->pid, cpumask_of(cpu));
> 			WARN_ON_ONCE(raw_smp_processor_id() != cpu);
> 		}
> 		cpus_read_unlock();
> 	}
> 
> Then I'm not entirely sure how we can return 0 and not run on the
> expected CPU. If we look at __set_cpus_allowed_ptr(), the only paths out
> to 0 are:
> 
>  - if the mask didn't change
>  - if we already run inside the new mask
>  - if we migrated ourself with the stop-task
>  - if we're not in fact running
> 
> That last case should never trigger in your circumstances, since @p ==
> current and current is obviously running. But for completeness, the
> wakeup of @p would do the task placement in that case.

Are there some diagnostics I could add that would help track this down,
be it my bug or yours?

							Thanx, Paul