lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 08 Aug 2011 14:31:50 +0800
From:	Tao Ma <tm@....ma>
To:	Shaohua Li <shli@...nel.org>
CC:	Jens Axboe <jaxboe@...ionio.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Christoph Hellwig <hch@...radead.org>,
	Roland Dreier <roland@...estorage.com>,
	Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH] block: Make rq_affinity = 1 work as expected.

On 08/08/2011 01:56 PM, Shaohua Li wrote:
> 2011/8/8 Tao Ma <tm@....ma>:
>> On 08/08/2011 12:33 PM, Shaohua Li wrote:
>>> 2011/8/8 Tao Ma <tm@....ma>:
>>>> Hi Shaohua,
>>>> On 08/08/2011 10:58 AM, Shaohua Li wrote:
>>>>> 2011/8/5 Jens Axboe <jaxboe@...ionio.com>:
>>>>>> On 2011-08-05 06:39, Tao Ma wrote:
>>>>>>> From: Tao Ma <boyu.mt@...bao.com>
>>>>>>>
>>>>>>> Commit 5757a6d76c introduced a new rq_affinity = 2 so as to make
>>>>>>> the request completed in the __make_request cpu. But it makes the
>>>>>>> old rq_affinity = 1 not work any more. The root cause is that
>>>>>>> if the 'cpu' and 'req->cpu' is in the same group and cpu != req->cpu,
>>>>>>> ccpu will be the same as group_cpu, so the completion will be
>>>>>>> excuted in the 'cpu' not 'group_cpu'.
>>>>>>>
>>>>>>> This patch fix problem by simpling removing group_cpu and the codes
>>>>>>> are more explicit now. If ccpu == cpu, we complete in cpu, otherwise
>>>>>>> we raise_blk_irq to ccpu.
>>>>>>
>>>>>> Thanks Tao Ma, much more readable too.
>>>>> Hi Jens,
>>>>> I rethought the problem when I check interrupt in my system. I thought
>>>>> we don't need Tao's patch though it makes the code behavior like before.
>>>>> Let's take an example. My test box has cpu 0-7, one socket. Say request
>>>>> is added in CPU 1, blk_complete_request occurs at CPU 7. Without Tao's
>>>>> patch, softirq will be done at CPU 7. With it, an IPI will be directed to CPU 0,
>>>>> and softirq will be done at CPU 0. In this case, doing softirq at CPU 0 and
>>>>> CPU 7 have no difference and we can avoid an ipi if doing it in CPU 7.
>>>> I totally agree with your analysis, but what I am worried is that this
>>>> does change the old system behavior.
>>>> And without this patch actually '1' and '2' in rq_affinity has the same
>>>> effect now in your case. If you do prefer the new codes and the new
>>>> behavior, then '1' don't need to exist any more(since from your
>>>> description it seems to only adds an additional IPI overhead and no
>>>> benefit), or '2' is totally unneeded here.
>>> with rq_affinity 2, CPU 1 will do the softirq in above case. it's
>>> still different
>>> like the rq_affinity 1 case.
>> OK, so let's see what's going on without the patch in case rq_affinity = 1.
>> If the complete cpu and the request cpu are in the same group, the
>> complete cpu will call softirq.
>> If the complete cpu and the request cpu are not in the same group, the
>> group cpu of the request cpu will call softirq.
>>
>> These behaviors are totally different. How can you tell the user what's
>> going on there? And that' the reason we want 0, 1, 2 for rq_affinity. If
>> the user does care about the extra IPI(in your case), fine, just set
>> rq_affinty = 2.
> rq_affinity=2: finish request in each cpu
> rq_affinity=1: finish request in one CPU for each socket.
> Even without your patch, rq_affinity=1 finish request in one CPU too.
We always finish request in one CPU, that is. The only difference is
which cpu to do the softirq work.
> Remember the controller only has one interrupt source. the only difference
> is request isn't always finished in the first CPU of a socket. I didn't
> think this is a behavior change which user even cares about.
That is your think. Thanks. At least it makes me feels strange when I
came across it and that's the reason why I found it. I am done with it.

Thanks
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ