lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFwWi-RveiuFEManbED3DcA76Cf3CKVXrrs5d5FvNgedOg@mail.gmail.com>
Date:	Mon, 30 Sep 2013 18:44:58 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Al Viro <viro@...iv.linux.org.uk>, Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Waiman Long <Waiman.Long@...com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: spinlock contention of files->file_lock

On Mon, Sep 30, 2013 at 6:05 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> Speaking of spinlock contention, the files->file_lock is a good example.
>
> Multi threaded programs hit this spinlock three times per fd :

.. do you actually have programs that see contention?

> alloc_fd() / fd_install() / close()
>
> I think fd_install() could be done without this spinlock.

Yes, I haven't thought much about it, but from a quick look it should
be trivial to do fd_install(). We already free the fdtable using rcu
(because we look things up without locking), so we could just get the
rcu read-lock, do fd_install() without any locking at all, and then
verify that the fd-table is still the same. Rinse and repeat if not.

> - Add a seqcount, and cmpxchg() to synchronize fd_install() with the
> potential resizer. (Might need additional RCU locking)

I don't think you even need that. alloc_fd() has already reserved the
spot, so I think you really can just assign without any fancy cmpxchg
at all. If you write to the old fdtable, who cares? Probably just
something like

   do {
      fdt = files_fdtable(files);
      rcu_assign_pointer(fdt->fd[fd], file);
      smp_mb();
   } while (fdt != files_fdtable(files));

or similar. Under the RCU lock to guarantee the allocations don't
disappear from under you, but with no other locking.

Maybe I'm missing something, but it really doesn't look that bad.

Now, that only gets rid of fd_install(), but I suspect you could do
something similar for put_unused_fd() (that one does need cmpxchg for
the "next_fd" thing, though). We'd have to replace the non-atomic
bitops on open_fds[] with atomic ones, just to make sure adjacent bit
clearings don't screw up concurrent adjacent bit values, but that
looks fairly straightforward too.

Now, getting rid of the spinlock for alloc_fd() would be more work,
and you'd likely want to keep it for actually resizing (just to keep
that case more straightforward), but you could probably do the
non-resizing cases fairly easily there too without holding the lock.
Scan for a free bit optimistically and without any locking, and then
be somewhat careful with setting that open_fds[] bit - not only using
an atomic test_and_set() for it, but also do the same "let's check
that the fdtable base pointers haven't changed in the meantime and
repeat".

On the whole the fd table looks like if it really is a contention
point, it would be fairly straightforward to fix without any huge
contortions. Sure, lots of care and small details to look out for, but
nothing looks really all that fundamentally difficult.

But is it really a contention point on any real load?

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ