linux-kernel - Re: [RFC PATCH 00/14] nfsd/sunrpc: add support for a workqueue-based nfsd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHQdGtQzth78BzBo9Gsz8LKe4ippJv6VYtjGCDmES7viByG_gA@mail.gmail.com>
Date:	Wed, 3 Dec 2014 14:08:01 -0500
From:	Trond Myklebust <trond.myklebust@...marydata.com>
To:	Jeff Layton <jeff.layton@...marydata.com>
Cc:	Tejun Heo <tj@...nel.org>, NeilBrown <neilb@...e.de>,
	Linux NFS Mailing List <linux-nfs@...r.kernel.org>,
	Linux Kernel mailing list <linux-kernel@...r.kernel.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [RFC PATCH 00/14] nfsd/sunrpc: add support for a workqueue-based nfsd

On Wed, Dec 3, 2014 at 2:02 PM, Jeff Layton <jeff.layton@...marydata.com> wrote:
> On Wed, 3 Dec 2014 11:04:05 -0500
> Jeff Layton <jlayton@...marydata.com> wrote:
>
>> On Wed, 3 Dec 2014 10:56:49 -0500
>> Tejun Heo <tj@...nel.org> wrote:
>>
>> > Hello, Neil, Jeff.
>> >
>> > On Tue, Dec 02, 2014 at 08:29:46PM -0500, Jeff Layton wrote:
>> > > That's a good point. I had originally thought that max_active on an
>> > > unbound workqueue would be the number of concurrent jobs that could run
>> > > across all the CPUs, but now that I look I'm not sure that's really
>> > > the case.
>> >
>> > @max_active is a per-pool number.  By default, unbound wqs use
>> > per-node pools, so @max_active would be per-node.  Currently,
>> > @max_active is mostly meant as a protection against run-away
>> > workqueues creating crazy number of workers, which has been enough for
>> > the existing wq users.  *Maybe* it makes sense to make it actually
>> > mean maximum concurrency which would prolly involve aggregated per-cpu
>> > distribution mechanism so that we don't end up inc'ing and dec'ing the
>> > same counter from all CPUs on each work item execution.
>> >
>> > However, I do agree with Neil that making it user configurable is
>> > almost always painful.  It's usually a question without a good answer
>> > and the same value may behave differently depending on a lot of
>> > implementation details and a better approach, probably, is to use
>> > @max_active as the last resort protection mechanism while providing
>> > automatic throttling of in-flight work items which is meaningful for
>> > the specific use cases.
>> >
>> > > I've heard random grumblings from various people in the past that
>> > > workqueues have significant latency, but this is the first time I've
>> > > really hit it in practice. If we can get this fixed, then that may be a
>> > > significant perf win for all workqueue users. For instance, rpciod in
>> > > the NFS client is all workqueue-based. Getting that latency down could
>> > > really help things.
>> > >
>> > > I'm currently trying to roll up a kernel module for benchmarking the
>> > > workqueue dispatching code in the hopes that we can use that to help
>> > > nail it down.
>> >
>> > Definitely, there were some reportings but nothing really got tracked
>> > down properly.  It'd be awesome to actually find out where the latency
>> > is coming from.
>> >
>> > Thanks!
>> >
>>
>> I think I might have figured this out (and before I go any farther
>> allow me to say <facepalm>), thanks to the workqueue tracepoints in the
>> code. What I noticed is that when things are fairly idle, the work is
>> picked up quickly, but once things get busy it takes a lot longer.
>>
>> I think that the issue is in the design of the workqueue-based nfsd
>> code. In particular, I attached a work_struct to the svc_xprt which is
>> limiting the code to only process one RPC at a time for a xprt, from
>> beginning to end.
>>
>> So, even if we requeue that work after the receive phase is done, the
>> workqueue won't pick it up again until the thing is processed and the
>> reply is sent.
>>
>> What I think I need to do is to do the receive phase using the
>> work_struct attached to the xprt, and then do the rest of the
>> processing from the context of a different work_struct (possibly one
>> attached to the svc_rqst), which should free up the xprt's work_struct
>> sooner.
>>
>> I'm going to work on changing that today and see if it improves things.
>>
>> Thanks for the help so far!
>
> Yes! That does help. The new workqueue based code is a little (a few
> percent?) slower than the thread-based code across the board. I suspect
> that's due to the fact that I'm having to queue each RPC to the
> workqueue twice (once for the receive and once to do the processing).
>
> I suspect that I can remedy that, but I'll have to think about the best
> way to do it.
>

Which workqueue are you using? Since the receive code is non-blocking,
I'd expect you might be able to use rpciod, for the initial socket
reads, but you wouldn't want to use that for the actual knfsd
processing.

-- 
Trond Myklebust

Linux NFS client maintainer, PrimaryData

trond.myklebust@...marydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/