lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 28 Oct 2015 05:27:16 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Josh Cartwright <joshc@...com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	tglx@...utronix.de, bigeasy@...utronix.de,
	linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	"David S. Miller" <davem@...emloft.net>,
	Clark Williams <williams@...hat.com>
Subject: Re: [PATCH -rt] Revert "net: use synchronize_rcu_expedited()"

On Wed, Oct 28, 2015 at 03:34:00AM -0500, Josh Cartwright wrote:
> On Tue, Oct 27, 2015 at 04:15:59PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 27, 2015 at 08:27:53AM -0700, Eric Dumazet wrote:
> > > On Tue, 2015-10-27 at 12:02 -0300, Arnaldo Carvalho de Melo wrote:
> [..]
> > > > The first suggestion, with it disabled by default seems to be the most
> > > > flexible tho, i.e, Paul's original message plus the boot parameter line:
> > > > 
> > > > Alternatively, a boot-time option could be used:
> > > > 
> > > > int some_rt_boot_parameter = CONFIG_SYNC_NET_DEFAULT;
> > > > 
> > > >         if (rtnl_is_locked() && !some_rt_boot_parameter)
> > > >                 synchronize_rcu_expedited();
> > > >         else
> > > >                 synchronize_rcu();
> > 
> > This could be OK, but why not start with something very simple and automatic?
> > We can always add more knobs when and if they actually prove necessary.
> 
> I suppose the question is if, for acme's usecases the answer to "when
> it's proven necessary" is "now".
> 
> > In contrast, unnecessary knobs can cause confusion and might at the same time
> > get locked into some misbegotten userspace application, which would make the
> > unnecessary knob really hard to get rid of.
> 
> I think I would make a stronger statement; the CONFIG_SYNC_NET_DEFAULT
> proposed option would be a boot/compile time parameter which says "I
> require networking (and network configuration) in my critical path", why
> don't we have these flags for other I/O subsystems?  What's special
> about networking?
> 
> We don't because applications can make use of thread priorities to
> express exactly which tasks should be more important than others.  So
> perhaps the failure here is that RCU (and networking, by implication)
> doesn't (can't?) take into consideration the calling thread's priority?
> (And, there may be a cascade of other problems as well, like deferred
> work pushed to a waitqueue, and thus losing the callers priority, etc)
> 
> (I will admit that RCU is a black box to me, so it is entirely possible
> it's already capable of this, or it's fundamentally impossible, or
> somewhere in between :)

CONFIG_RCU_KTHREAD_PRIO=nn, where 0 says SCHED_OTHER and 0 < nn <= 99
says SCHED_FIFO with RT priority nn.

> > > > Then RT oriented kernel .config files would have CONFIG_SYNC_NET_DEFAULT
> > > > set to 1, while upstream would have this default to 0.
> > > > 
> > > > RT oriented kernel users could try using this in some scenarios where
> > > > networking is not the critical path.
> > > 
> > > Well, if synchronize_rcu_expedited() is such a problem on RT, then maybe
> > > a generic solution would make synchronize_rcu_expedited() to fallback
> > > synchronize_rcu() after boot time on RT.
> > > 
> > > Not sure why networking use of synchronize_rcu_expedited() would be
> > > problematic, and not the others.
> > 
> > From what I can see, their testing just happened to run into this one.
> > Perhaps further testing will run into others, or perhaps the others are
> > off in code paths that should not be exercised while running RT apps.
> 
> I accidentally ran into this issue when I was doing testing with an
> ethernet cable w/ a broken RJ-45 connector (without the tab, that I was
> just too lazy to replace), and I kept accidentally knocking it out.  :)
> 
> Regardless, industrial automation environments aren't known for having
> the most stable network environments; there may be deployed systems
> doing high priority motion control tasks, we'd want to ensure that the
> poor network technician sent in to repair a defective network switch
> wouldn't end up being mangled.
> 
> > > scripts/checkpatch.pl has this comment about this :
> 
> Also, Documentation/RCU/checklist.txt mentions:
> 
> 	Use of the expedited primitives should be restricted to rare
> 	configuration-change operations that would not normally be
> 	undertaken while a real-time..
> 
> I think it could have been argued at the time, that operations under
> rtnl_lock() were "configuration-change" operations.  However, for our
> use cases, it's not, as link changes are external events beyond control.

Certainly the variety of operations that people are willing to run
concurrently with real-time applications seems to be steadily growing
over time...  But much depends on the RT deadlines.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists