lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANejiEXBH0uZbXmgYraPT435dUwMV6JR90+mWmUZhzRiQ1Fehg@mail.gmail.com>
Date:	Tue, 16 Oct 2012 15:48:47 +0800
From:	Shaohua Li <shli@...nel.org>
To:	Jianpeng Ma <majianpeng@...il.com>
Cc:	axboe <axboe@...nel.dk>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: Re: [PATCH] block: Add blk_rq_pos(rq) to sort rq when plushing plug-list.

2012/10/16 Jianpeng Ma <majianpeng@...il.com>:
> On 2012-10-15 21:18 Shaohua Li <shli@...nel.org> Wrote:
>>2012/10/15 Shaohua Li <shli@...ionio.com>:
>>> 2012/10/15 Jianpeng Ma <majianpeng@...il.com>:
>>>> My workload is a raid5 which had 16 disks. And used our filesystem to
>>>> write using direct-io mode.
>>>> I used the blktrace to find those message:
>>>>
>>>> 8,16   0     3570     1.083923979  2519  I   W 144323176 + 24 [md127_raid5]
>>>> 8,16   0        0     1.083926214     0  m   N cfq2519 insert_request
>>>> 8,16   0     3571     1.083926586  2519  I   W 144323072 + 104 [md127_raid5]
>>>> 8,16   0        0     1.083926952     0  m   N cfq2519 insert_request
>>>> 8,16   0     3572     1.083927180  2519  U   N [md127_raid5] 2
>>>> 8,16   0        0     1.083927870     0  m   N cfq2519 Not idling.st->count:1
>>>> 8,16   0        0     1.083928320     0  m   N cfq2519 dispatch_insert
>>>> 8,16   0        0     1.083928951     0  m   N cfq2519 dispatched a request
>>>> 8,16   0        0     1.083929443     0  m   N cfq2519 activate rq,drv=1
>>>> 8,16   0     3573     1.083929530  2519  D   W 144323176 + 24 [md127_raid5]
>>>> 8,16   0        0     1.083933883     0  m   N cfq2519 Not idling.st->count:1
>>>> 8,16   0        0     1.083934189     0  m   N cfq2519 dispatch_insert
>>>> 8,16   0        0     1.083934654     0  m   N cfq2519 dispatched a request
>>>> 8,16   0        0     1.083935014     0  m   N cfq2519 activate rq,drv=2
>>>> 8,16   0     3574     1.083935101  2519  D   W 144323072 + 104 [md127_raid5]
>>>> 8,16   0     3575     1.084196179     0  C   W 144323176 + 24 [0]
>>>> 8,16   0        0     1.084197979     0  m   N cfq2519 complete rqnoidle 0
>>>> 8,16   0     3576     1.084769073     0  C   W 144323072 + 104 [0]
>>>>   ......
>>>> 8,16   1     3596     1.091394357  2519  I   W 144322544 + 16 [md127_raid5]
>>>> 8,16   1        0     1.091396181     0  m   N cfq2519 insert_request
>>>> 8,16   1     3597     1.091396571  2519  I   W 144322520 + 24 [md127_raid5]
>>>> 8,16   1        0     1.091396934     0  m   N cfq2519 insert_request
>>>> 8,16   1     3598     1.091397165  2519  I   W 144322488 + 32 [md127_raid5]
>>>> 8,16   1        0     1.091397477     0  m   N cfq2519 insert_request
>>>> 8,16   1     3599     1.091397708  2519  I   W 144322432 + 56 [md127_raid5]
>>>> 8,16   1        0     1.091398023     0  m   N cfq2519 insert_request
>>>> 8,16   1     3600     1.091398284  2519  U   N [md127_raid5] 4
>>>> 8,16   1        0     1.091398986     0  m   N cfq2519 Not idling. st->count:1
>>>> 8,16   1        0     1.091399511     0  m   N cfq2519 dispatch_insert
>>>> 8,16   1        0     1.091400217     0  m   N cfq2519 dispatched a request
>>>> 8,16   1        0     1.091400688     0  m   N cfq2519 activate rq,drv=1
>>>> 8,16   1     3601     1.091400766  2519  D   W 144322544 + 16 [md127_raid5]
>>>> 8,16   1        0     1.091406151     0  m   N cfq2519 Not idling.st->count:1
>>>> 8,16   1        0     1.091406460     0  m   N cfq2519 dispatch_insert
>>>> 8,16   1        0     1.091406931     0  m   N cfq2519 dispatched a request
>>>> 8,16   1        0     1.091407291     0  m   N cfq2519 activate rq,drv=2
>>>> 8,16   1     3602     1.091407378  2519  D   W 144322520 + 24 [md127_raid5]
>>>> 8,16   1        0     1.091414006     0  m   N cfq2519 Not idling.st->count:1
>>>> 8,16   1        0     1.091414297     0  m   N cfq2519 dispatch_insert
>>>> 8,16   1        0     1.091414702     0  m   N cfq2519 dispatched a request
>>>> 8,16   1        0     1.091415047     0  m   N cfq2519 activate rq, drv=3
>>>> 8,16   1     3603     1.091415125  2519  D   W 144322488 + 32 [md127_raid5]
>>>> 8,16   1        0     1.091416469     0  m   N cfq2519 Not idling.st->count:1
>>>> 8,16   1        0     1.091416754     0  m   N cfq2519 dispatch_insert
>>>> 8,16   1        0     1.091417186     0  m   N cfq2519 dispatched a request
>>>> 8,16   1        0     1.091417535     0  m   N cfq2519 activate rq,drv=4
>>>> 8,16   1     3604     1.091417628  2519  D   W 144322432 + 56 [md127_raid5]
>>>> 8,16   1     3605     1.091857225  4393  C   W 144322544 + 16 [0]
>>>> 8,16   1        0     1.091858753     0  m   N cfq2519 complete rqnoidle 0
>>>> 8,16   1     3606     1.092068456  4393  C   W 144322520 + 24 [0]
>>>> 8,16   1        0     1.092069851     0  m   N cfq2519 complete rqnoidle 0
>>>> 8,16   1     3607     1.092350440  4393  C   W 144322488 + 32 [0]
>>>> 8,16   1        0     1.092351688     0  m   N cfq2519 complete rqnoidle 0
>>>> 8,16   1     3608     1.093629323     0  C   W 144322432 + 56 [0]
>>>> 8,16   1        0     1.093631151     0  m   N cfq2519 complete rqnoidle 0
>>>> 8,16   1        0     1.093631574     0  m   N cfq2519 will busy wait
>>>> 8,16   1        0     1.093631829     0  m   N cfq schedule dispatch
>>>>
>>>> Because in func "elv_attempt_insert_merge", it only to try to
>>>> backmerge.So the four request can't merge in theory.
>>>> I trace ten minutes and count those situation, it can count 25%.
>>>>
>>>> With the patch,i tested and not found situation like above.
>>>>
>>>> Signed-off-by: Jianpeng Ma <majianpeng@...il.com>
>>>> ---
>>>>  block/blk-core.c |    3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index a33870b..3c95c4d 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -2868,7 +2868,8 @@ static int plug_rq_cmp(void *priv, struct list_head *a, struct list_head *b)
>>>>         struct request *rqa = container_of(a, struct request, queuelist);
>>>>         struct request *rqb = container_of(b, struct request, queuelist);
>>>>
>>>> -       return !(rqa->q <= rqb->q);
>>>> +       return !(rqa->q < rqb->q ||
>>>> +               (rqa->q == rqb->q && blk_rq_pos(rqa) < blk_rq_pos(rqb)));
>>>>  }
>>>
>>> Does this one help too?
>>> http://marc.info/?l=linux-kernel&m=132399972114668&w=2
>>>
>>> I thought the real problem is we don't do recursive request
>>> merge. I had no objection to the patch itself, but just hope we
>>> can make recursive merge work, which is more generic.
>>
>>Oh, wait, the 4 requests aren't merged completely in your case.
>>And the requests are from one thread and plug context.
>>Not the issue I mentioned. I'm wondering how this could happen.
>>they should be merged in attempt_plug_merge already.
> Hi shaohua,
>         I think i missed some messages about blktrace.So make you confused.
> I trace some message using blktrace:
>  8,16   0     6638     2.453619407  2579  Q   W 7493144 + 8 [md0_raid5]
>   8,16   0     6639     2.453620460  2579  G   W 7493144 + 8 [md0_raid5]
>   8,16   0     6640     2.453639311  2579  Q   W 7493120 + 8 [md0_raid5]
>   8,16   0     6641     2.453639842  2579  G   W 7493120 + 8 [md0_raid5]
>   8,16   0     6642     2.453647796  2579  Q   W 7493128 + 8 [md0_raid5]
>   8,16   0     6643     2.453647940  2579  M   W 7493128 + 8 [md0_raid5]
>   8,16   0     6644     2.453658249  2579  Q   W 7493136 + 8 [md0_raid5]
>   8,16   0     6645     2.453658393  2579  M   W 7493136 + 8 [md0_raid5]
>   8,16   0     6646     2.453665309  2579  Q   W 7493152 + 8 [md0_raid5]
>   8,16   0     6647     2.453665504  2579  M   W 7493152 + 8 [md0_raid5]
>   8,16   0     6648     2.453672411  2579  Q   W 7493160 + 8 [md0_raid5]
>   8,16   0     6649     2.453672606  2579  M   W 7493160 + 8 [md0_raid5]
>   8,16   0     6650     2.453679255  2579  Q   W 7493168 + 8 [md0_raid5]
>   8,16   0     6651     2.453679441  2579  M   W 7493168 + 8 [md0_raid5]
>   8,16   0     6652     2.453685948  2579  Q   W 7493176 + 8 [md0_raid5]
>   8,16   0     6653     2.453686149  2579  M   W 7493176 + 8 [md0_raid5]
>   8,16   0     6654     2.453693074  2579  Q   W 7493184 + 8 [md0_raid5]
>   8,16   0     6655     2.453693254  2579  M   W 7493184 + 8 [md0_raid5]
>   8,16   0     6656     2.453704290  2579  Q   W 7493192 + 8 [md0_raid5]
>   8,16   0     6657     2.453704482  2579  M   W 7493192 + 8 [md0_raid5]
>   8,16   0     6658     2.453715016  2579  Q   W 7493200 + 8 [md0_raid5]
>   8,16   0     6659     2.453715247  2579  M   W 7493200 + 8 [md0_raid5]
>   8,16   0     6660     2.453721730  2579  Q   W 7493208 + 8 [md0_raid5]
>   8,16   0     6661     2.453721974  2579  M   W 7493208 + 8 [md0_raid5]
>   8,16   0     6662     2.453728202  2579  Q   W 7493216 + 8 [md0_raid5]
>   8,16   0     6663     2.453728436  2579  M   W 7493216 + 8 [md0_raid5]
>   8,16   0     6664     2.453734782  2579  Q   W 7493224 + 8 [md0_raid5]
>   8,16   0     6665     2.453735019  2579  M   W 7493224 + 8 [md0_raid5]
>   8,16   0     6666     2.453741401  2579  Q   W 7493232 + 8 [md0_raid5]
>   8,16   0     6667     2.453741632  2579  M   W 7493232 + 8 [md0_raid5]
>   8,16   0     6668     2.453748148  2579  Q   W 7493240 + 8 [md0_raid5]
>   8,16   0     6669     2.453748386  2579  M   W 7493240 + 8 [md0_raid5]
>   8,16   0     6670     2.453851843  2579  I   W 7493144 + 104 [md0_raid5]
>   8,16   0        0     2.453853661     0  m   N cfq2579 insert_request
>   8,16   0     6671     2.453854064  2579  I   W 7493120 + 24 [md0_raid5]
>   8,16   0        0     2.453854439     0  m   N cfq2579 insert_request
>   8,16   0     6672     2.453854793  2579  U   N [md0_raid5] 2
>   8,16   0        0     2.453855513     0  m   N cfq2579 Not idling. st->count:1
>   8,16   0        0     2.453855927     0  m   N cfq2579 dispatch_insert
>   8,16   0        0     2.453861771     0  m   N cfq2579 dispatched a request
>   8,16   0        0     2.453862248     0  m   N cfq2579 activate rq, drv=1
>   8,16   0     6673     2.453862332  2579  D   W 7493120 + 24 [md0_raid5]
>   8,16   0        0     2.453865957     0  m   N cfq2579 Not idling. st->count:1
>   8,16   0        0     2.453866269     0  m   N cfq2579 dispatch_insert
>   8,16   0        0     2.453866707     0  m   N cfq2579 dispatched a request
>   8,16   0        0     2.453867061     0  m   N cfq2579 activate rq, drv=2
>   8,16   0     6674     2.453867145  2579  D   W 7493144 + 104 [md0_raid5]
>   8,16   0     6675     2.454147608     0  C   W 7493120 + 24 [0]
>   8,16   0        0     2.454149357     0  m   N cfq2579 complete rqnoidle 0
>   8,16   0     6676     2.454791505     0  C   W 7493144 + 104 [0]
>   8,16   0        0     2.454794803     0  m   N cfq2579 complete rqnoidle 0
>   8,16   0        0     2.454795160     0  m   N cfq schedule dispatch
>
> From above messages,we can found why rq[W 7493144 + 104] and rq[W 7493120 + 24] do not merge.
> Because the bio order is:
>   8,16   0     6638     2.453619407  2579  Q   W 7493144 + 8 [md0_raid5]
>   8,16   0     6639     2.453620460  2579  G   W 7493144 + 8 [md0_raid5]
>   8,16   0     6640     2.453639311  2579  Q   W 7493120 + 8 [md0_raid5]
>   8,16   0     6641     2.453639842  2579  G   W 7493120 + 8 [md0_raid5]
> Because the bio(7493144) first and bio(7493120) later.So the subsequent bios will be divided into two parts.
> As you mentions, recursive merge doesn't work for this situation.
> Why is can occur?I think because the raid operation.It disorder some bios.
>
> Is it ok? If ok,i'll resend the patch using useful message.

So the real problem is attempt_plug_merge can't do recursive merge
for merged request, because there is no elevator info attached to the
request, so we can only do the merge in elv_attempt_insert_merge().

I'm still hoping elv_attempt_insert_merge() can do frontmerge and
recursive merge, which is more generic and can solve more
problems than your patch. But giving the simplicity of your patch,
maybe we should merge yours at that time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ