[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24E8E2D2-F91B-47F6-91BF-02D02750054F@oracle.com>
Date: Sun, 25 Jun 2023 16:01:38 +0000
From: Chuck Lever III <chuck.lever@...cle.com>
To: Tejun Heo <tj@...nel.org>
CC: open list <linux-kernel@...r.kernel.org>,
Linux NFS Mailing List <linux-nfs@...r.kernel.org>
Subject: Re: contention on pwq->pool->lock under heavy NFS workload
Hi Tejun-
> On Jun 23, 2023, at 9:44 PM, Tejun Heo <tj@...nel.org> wrote:
>
> Hey,
>
> On Fri, Jun 23, 2023 at 02:37:17PM +0000, Chuck Lever III wrote:
>> I'm using NFS/RDMA for my test because I can drive more IOPS with it.
>>
>> I've found that setting the nfsiod and rpciod workqueues to "cpu"
>> scope provide the best benefit for this workload. Changing the
>> xprtiod workqueue to "cpu" had no discernible effect.
>>
>> This tracks with the number of queue_work calls for each of these
>> WQs. 59% of queue_work calls during the test are for the rpciod
>> WQ, 21% are for nfsiod, and 2% is for xprtiod.
>>
>> The same test with TCP (using IP-over-IB on the same physical network)
>> shows no improvement on any test. That suggests there is a bottleneck
>> somewhere else, when using TCP, that limits its throughput.
>
> Yeah, you can make the necessary workqueues to default to CPU or SMT scope
> using apply_workqueue_attrs(). The interface a bit cumbersome and we
> probably wanna add convenience helpers to switch e.g. affinity scopes but
> it's still just several lines of code.
6037 static ssize_t wq_affn_scope_store(struct device *dev,
6038 struct device_attribute *attr,
6039 const char *buf, size_t count)
6040 {
6041 struct workqueue_struct *wq = dev_to_wq(dev);
6042 struct workqueue_attrs *attrs;
6043 int affn, ret = -ENOMEM;
6044
6045 affn = parse_affn_scope(buf);
6046 if (affn < 0)
6047 return affn;
6048
6049 apply_wqattrs_lock(); <<< takes &wq_pool_mutex
6050 attrs = wq_sysfs_prep_attrs(wq); <<< copies the wq_attrs
6051 if (attrs) {
6052 attrs->affn_scope = affn;
6053 ret = apply_workqueue_attrs_locked(wq, attrs);
6054 }
6055 apply_wqattrs_unlock();
6056 free_workqueue_attrs(attrs);
6057 return ret ?: count;
6058 }
Both wq_pool_mutex and copy_workqueue_attrs() are static, so having
only apply_workqueue_attrs() is not yet enough to carry this off
in workqueue consumers such as sunrpc.ko.
It looks like padata_setup_cpumasks() for example is holding the
CPU read lock, but it doesn't take the wq_pool_mutex.
apply_wqattrs_prepare() has a "lockdep_assert_held(&wq_pool_mutex);" .
I can wait for a v3 of this series so you can construct the public
API the way you prefer.
--
Chuck Lever
Powered by blists - more mailing lists