netdev - Re: [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20211029084527.57d38f09@hermes.local>
Date:   Fri, 29 Oct 2021 08:45:27 -0700
From:   Stephen Hemminger <stephen@...workplumber.org>
To:     Antoine Tenart <atenart@...nel.org>
Cc:     davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
        gregkh@...uxfoundation.org, ebiederm@...ssion.com,
        herbert@...dor.apana.org.au, juri.lelli@...hat.com,
        netdev@...r.kernel.org, mhocko@...e.com
Subject: Re: [RFC PATCH net-next 0/9] Userspace spinning on net-sysfs access

On Fri, 29 Oct 2021 16:33:05 +0200
Antoine Tenart <atenart@...nel.org> wrote:

> With the approach taken in this thread not going too well[1], what next?
> I think we should discuss other possibilities and gather some ideas.
> Below are some early thoughts, that might not be acceptable.
> 
> 1) Making an rtnl_lock helper that returns when needed
> 
> The idea would be to replace rtnl_trylock/restart_syscall by this
> helper, which would try to grab the rtnl lock and return when needed.
> Something like the following:
> 
>   static rtnl_lock_ifalive(const struct net_device *dev)
>   {
>           while (!rtnl_trylock()) {
>                   if (!dev_isalive(dev))
>                           return -EINVAL;
> 
>                   /* Special case for queue files */
>                   if (dev->drain_sysfs_queues)
>                           return restart_syscall();
> 
>                   /* something not to have the CPU spinning */
>           }
>   }
> 
> One way not to have the CPU spinning is to sleep, let's say with
> `usleep_range(500, 1000);` (range to be defined properly). The
> disadvantage is on net device unregistration as we might need to wait
> for all those loops to return first. (It's a trade-off though, we're not
> restarting syscalls over and over in other situations). Or would there
> be something better?
> 
> Possible improvements:
> - Add an overall timeout and restart the syscall if we hit it, to have
>   an upper bound.
> - Make it interruptible, check for need_resched, etc.
> 
> Note that this approach could work for sysctl files as well; looping
> over all devices in a netns to make the checks.
> 
> 2) Interrupt rtnl_lock when in the unregistration paths
> 
> I'm wondering if using mutex_lock_interruptible in problematic areas
> (sysfs, sysctl), keeping track of their tasks and interrupting them in
> the unregistration paths would work and be acceptable. On paper this
> looks like a solution with not much overhead and not too invasive to
> implement. But is it acceptable? (Are there some side effects we really
> don't want?).
> 
> Note that this would need some thinking to make it safe against sysfs
> accesses between the tasks interruption and the sysfs files draining
> (refcount? another lock?).
> 
> 3) Other ideas?
> 
> Also, I'm not sure about the -rt implications of all the above.
> 
> Thanks,
> Antoine
> 
> [1] https://lore.kernel.org/netdev/163549826664.3523.4140191764737040064@kwain/

As the originator of net-sysfs and the trylock, I appreciate your attempts
to do something better. The best solution may actually not be just in the
networking code but go all the way back up to restart_syscall() itself.


Spinning a CPU itself is not a bad thing, see spin locks.
The problem is if the spin causes the active CPU to not make progress.