[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1429224740.7346.225.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Thu, 16 Apr 2015 15:52:20 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Mateusz Guzik <mguzik@...hat.com>
Cc: Al Viro <viro@...IV.linux.org.uk>,
Andrew Morton <akpm@...ux-foundation.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Yann Droneaud <ydroneaud@...eya.com>,
Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in
fd_install
On Fri, 2015-04-17 at 00:00 +0200, Mateusz Guzik wrote:
> On Thu, Apr 16, 2015 at 01:55:39PM -0700, Eric Dumazet wrote:
> > On Thu, 2015-04-16 at 13:42 -0700, Eric Dumazet wrote:
> > > On Thu, 2015-04-16 at 19:09 +0100, Al Viro wrote:
> > > > On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote:
> > > > > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, int nr)
> > > > > cur_fdt = files_fdtable(files);
> > > > > if (nr >= cur_fdt->max_fds) {
> > > > > /* Continue as planned */
> > > > > + write_seqcount_begin(&files->fdt_seqcount);
> > > > > copy_fdtable(new_fdt, cur_fdt);
> > > > > rcu_assign_pointer(files->fdt, new_fdt);
> > > > > + write_seqcount_end(&files->fdt_seqcount);
> > > > > if (cur_fdt != &files->fdtab)
> > > > > call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
> > > >
> > > > Interesting. AFAICS, your test doesn't step anywhere near that path,
> > > > does it? So basically you never hit the retries during that...
> > >
> > > Right, but then the table is almost never changed for a given process,
> > > as we only increase it by power of two steps.
> > >
> > > (So I scratch my initial comment, fdt_seqcount is really mostly read)
> >
> > I tested Mateusz patch with my opensock program, mimicking a bit more
> > what a server does (having lot of sockets)
> >
> > 24 threads running, doing close(randomfd())/socket() calls like crazy.
> >
> > Before patch :
> >
> > # time ./opensock
> >
> > real 0m10.863s
> > user 0m0.954s
> > sys 2m43.659s
> >
> >
> > After patch :
> >
> > # time ./opensock
> >
> > real 0m9.750s
> > user 0m0.804s
> > sys 2m18.034s
> >
> > So this is an improvement for sure, but not massive.
> >
> > perf record ./opensock ; report
> >
> > 87.80% opensock [kernel.kallsyms] [k] _raw_spin_lock
> > |--52.70%-- __close_fd
> > |--46.41%-- __alloc_fd
>
> My crap benchmark is here: http://people.redhat.com/~mguzik/pipebench.c
> (compile with -pthread, run with -s 10 -n 16 for 10 second test + 16
> threads)
>
> As noted earlier it tends to go from rougly 300k ops/s to 400.
>
> The fundamental problem here seems to be this pesky POSIX requirement of
> providing the lowest possible fd on each allocation (as a side note
> Linux breaks this with parallel fd allocs, where one of these backs off
> the reservation, not that I believe this causes trouble).
Note POSIX never talked about multi threads. The POSIX requirement came
from traditional linux stdin/stdout/stderr handling and legacy programs,
before dup2() even existed.
>
> Ideally a process-wide switch could be implemented (e.g.
> prctl(SCRATCH_LOWEST_FD_REQ)) which would grant the kernel the freedom
> to return any fd it wants, so it would be possible to have fd ranges
> per thread and the like.
I played months ago with a SOCK_FD_FASTALLOC ;)
idea was to use a random starting point instead of 0.
But the bottleneck was really the spinlock, not the bit search, unless I
used 10 million fds in the program...
>
> Having only a O_SCRATCH_POSIX flag passed to syscalls would still leave
> close() as a bottleneck.
>
> In the meantime I consider the approach taken in my patch as an ok
> temporary improvement.
Yes please formally submit this patch.
Note that adding atomic bit operations could eventually allow to not
hold the spinlock at close() time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists