netdev - Re: Question about synchronize_net() in AF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAN8CM3wpZ236h3Okfi0AS+9+0P05fd1JVwfUOXfbp_LV2PqLLQ@mail.gmail.com>
Date:	Wed, 17 Sep 2014 10:04:04 -0700
From:	Martin Kelly <martin@...tingkelly.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Stephen Hemminger <stephen@...workplumber.org>
Subject: Re: Question about synchronize_net() in AF_PACKET close()

On Wed, Sep 17, 2014 at 7:54 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>
> Can you describe the ... ?
>

Sure. The patch I've been looking at is functionally identical to the
one referenced in http://patchwork.ozlabs.org/patch/181605/ with the
exception that sock_orphan() and sock->sk = NULL is moved before the
synchronize_net(). This further improves the performance relative to
http://patchwork.ozlabs.org/patch/181605/ because it never calls
synchronize_net(), whereas http://patchwork.ozlabs.org/patch/181605/
calls synchronize_net() in the po->running case. However, it looks
like making that optimization is not safe, for the reasons you
mentioned below.

>
> What problem do you want to solve exactly ?
>
> I believe its not safe, you missed sk_data_ready() call
> (sock_def_readable())
>

The problem I I'm trying to solve is the same one mentioned in
http://patchwork.ozlabs.org/patch/181605/: On systems with a lot of
RCU contention, raw socket close() can take up to 300 ms or so, which
really adds up for processes that open lots of raw sockets. What you
see is a many-second hang when the process exits as it waits for many
synchronize_rcu calls (all in serial) to finish. Although it would be
better for close() not to be in a critical path, it's sometimes hard
to get around for some userspace processes, so this kind of change
could yield a significant performance improvement for such processes.

You're right, I missed sk_def_readable(), which looks like it would
cause a crash if it occurred before the call_rcu() was called. I see
that sk_def_readable() acceses sk->sk_wq through rcu_read_lock(),
rcu_dereference(), rcu_read_unlock(), while sock_orphan() sets
sk->sk_wq = NULL without an RCU function. You could modify
sock_orphan() and sock->sk to use RCU too, but you would need to patch
all access sites, and the patch would quickly become complex and
error-prone, which is not worth it.

Does my analysis sound correct to you? If so, can you think of a more
natural way to improve raw socket close() performance? I agree that
optimally, userspace should not put close() in a critical path, but it
still seems like a bug for close() to take 300 ms to complete.

Thanks,
Martin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html