lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Dec 2021 22:39:02 -0800
From:   Song Liu <song@...nel.org>
To:     Yajun Deng <yajun.deng@...ux.dev>
Cc:     stockhausen@...logia.de, open list <linux-kernel@...r.kernel.org>,
        linux-rt-users@...r.kernel.org,
        linux-raid <linux-raid@...r.kernel.org>, masahiroy@...nel.org,
        williams@...hat.com
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

On Wed, Dec 15, 2021 at 6:15 PM <yajun.deng@...ux.dev> wrote:
>
> December 16, 2021 12:52 AM, "Song Liu" <song@...nel.org> wrote:
>
> > On Tue, Dec 14, 2021 at 6:14 PM <yajun.deng@...ux.dev> wrote:
> >
> >> December 15, 2021 1:27 AM, "Song Liu" <song@...nel.org> wrote:
> >>
> >> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <yajun.deng@...ux.dev> wrote:
> >>
> >> We found an abnormally high latency when executing modprobe raid6_pq, the
> >> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
> >> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
> >> This is caused by disable the preemption, this time is too long and
> >> unreasonable. We just need to disable migration. so used migrate_disable()/
> >> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
> >> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
> >> CONFIG_PREEMPT_VOLUNTARY=y.
> >>
> >> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> >> Signed-off-by: Yajun Deng <yajun.deng@...ux.dev>
> >>
> >> We measure the speed of different RAID algorithms.If we don't disable
> >> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
> >> for 16 jiffies. Why do we see 1.2 second delay?
> >>
> >> Here are the command of my test:
> >> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe
> >> raid6_pq" in the other terminal.
> >>
> >> Here are the results of my test:
> >> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
> >> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
> >> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
> >
> > I am not very familiar with the RT work, so please forgive me for some
> > rookie questions.
> >
> > From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
> > CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
> > raid6_choose_gen(). Is this expected?
> >
>
> No, This is due to disable preemption causing ksoftirqd fail to schedule, we can use bcc tools see that.
>
> > OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
> > Is this acceptable? If not, is 1ms latency acceptable?
> >
>
> Here are the test results after adding patch:
> CONFIG_PREEMPT=y  or CONFIG_PREEMPT_RT=y
> T: 0 ( 3167) P:95 I:1000 C:  13958 Min:      1 Act:    2 Avg:    1 Max:       5
> T: 1 ( 3168) P:95 I:1000 C:  13956 Min:      1 Act:    2 Avg:    1 Max:       7
> T: 2 ( 3169) P:95 I:1000 C:  13946 Min:      1 Act:    2 Avg:    1 Max:      12
> T: 3 ( 3170) P:95 I:1000 C:  13951 Min:      1 Act:    2 Avg:    1 Max:       5
> T: 4 ( 3171) P:95 I:1000 C:  13949 Min:      1 Act:    2 Avg:    1 Max:       3
> T: 5 ( 3172) P:95 I:1000 C:  13947 Min:      1 Act:    2 Avg:    1 Max:      16
> T: 6 ( 3173) P:95 I:1000 C:  13945 Min:      1 Act:    2 Avg:    2 Max:       7
> T: 7 ( 3174) P:95 I:1000 C:  13942 Min:      1 Act:    2 Avg:    1 Max:       3
> T: 8 ( 3175) P:95 I:1000 C:  13940 Min:      1 Act:    2 Avg:    1 Max:       3
> T: 9 ( 3176) P:95 I:1000 C:  13938 Min:      1 Act:    1 Avg:    1 Max:       3
> T:10 ( 3177) P:95 I:1000 C:  13936 Min:      1 Act:    2 Avg:    1 Max:       6
> T:11 ( 3178) P:95 I:1000 C:  13933 Min:      1 Act:    2 Avg:    1 Max:       3
> T:12 ( 3179) P:95 I:1000 C:  13931 Min:      1 Act:    2 Avg:    1 Max:       4
> T:13 ( 3180) P:95 I:1000 C:  13929 Min:      1 Act:    2 Avg:    1 Max:       7
> T:14 ( 3181) P:95 I:1000 C:  13927 Min:      1 Act:    2 Avg:    1 Max:       6
> T:15 ( 3182) P:95 I:1000 C:  13925 Min:      1 Act:    2 Avg:    1 Max:       4
> T:16 ( 3183) P:95 I:1000 C:  13923 Min:      1 Act:    2 Avg:    1 Max:       5
> T:17 ( 3184) P:95 I:1000 C:  13921 Min:      1 Act:    2 Avg:    1 Max:       5
> T:18 ( 3185) P:95 I:1000 C:  13919 Min:      1 Act:    2 Avg:    1 Max:       4
> T:19 ( 3186) P:95 I:1000 C:  13916 Min:      1 Act:    2 Avg:    1 Max:       4
> T:20 ( 3187) P:95 I:1000 C:  13914 Min:      1 Act:    2 Avg:    1 Max:       4
> T:21 ( 3188) P:95 I:1000 C:  13912 Min:      1 Act:    2 Avg:    1 Max:      10
> T:22 ( 3189) P:95 I:1000 C:  13910 Min:      1 Act:    2 Avg:    1 Max:       5
> T:23 ( 3190) P:95 I:1000 C:  13908 Min:      1 Act:    2 Avg:    1 Max:       5
> T:24 ( 3191) P:95 I:1000 C:  13906 Min:      1 Act:    2 Avg:    1 Max:      18
> T:25 ( 3192) P:95 I:1000 C:  13904 Min:      1 Act:    2 Avg:    1 Max:       7
> T:26 ( 3193) P:95 I:1000 C:  13902 Min:      1 Act:    2 Avg:    1 Max:       5
> T:27 ( 3194) P:95 I:1000 C:  13900 Min:      1 Act:    1 Avg:    1 Max:      11
> T:28 ( 3195) P:95 I:1000 C:  13898 Min:      1 Act:    2 Avg:    1 Max:       3
> T:29 ( 3196) P:95 I:1000 C:  13896 Min:      1 Act:    2 Avg:    1 Max:       5
> T:30 ( 3197) P:95 I:1000 C:  13894 Min:      1 Act:    2 Avg:    1 Max:       4
> T:31 ( 3198) P:95 I:1000 C:  13892 Min:      1 Act:    2 Avg:    1 Max:       3
>
> we can see the latency will not greater than 100us,so 1ms latency is also too long for CONFIG_PREEMPT=y  or CONFIG_PREEMPT_RT=y.
> use migrate_disable()/migrate_enable() instead of preempt_disable()/preempt_enable() will not affect the speed of different RAID algorithms and the latency can be reduced to a reasonable range.

I think allowing preempt may still affect the speed comparison. But
such discrepancy
should be acceptable. I will apply this to md-next.

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ