lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGWkznGpW=bUxET8yZGu4dNTBfsj7n79yXsTD23fE5-SWkdjfA@mail.gmail.com>
Date: Fri, 26 Jan 2024 15:59:48 +0800
From: Zhaoyang Huang <huangzhaoyang@...il.com>
To: "zhaoyang.huang" <zhaoyang.huang@...soc.com>, Matthew Wilcox <willy@...radead.org>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, 
	linux-fsdevel@...r.kernel.org
Cc: Andrew Morton <akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>, 
	Yu Zhao <yuzhao@...gle.com>, Damien Le Moal <dlemoal@...nel.org>, 
	Niklas Cassel <niklas.cassel@....com>, "Martin K . Petersen" <martin.petersen@...cle.com>, 
	Hannes Reinecke <hare@...e.de>, Linus Walleij <linus.walleij@...aro.org>, linux-mm@...ck.org, 
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
	steve.kang@...soc.com
Subject: Re: [PATCHv3 1/1] block: introduce content activity based ioprio

loop more mm and fs guys for more comments

On Thu, Jan 25, 2024 at 3:22 PM zhaoyang.huang
<zhaoyang.huang@...soc.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@...soc.com>
>
> Currently, request's ioprio are set via task's schedule priority(when no
> blkcg configured), which has high priority tasks possess the privilege on
> both of CPU and IO scheduling.
> This commit works as a hint of original policy by promoting the request ioprio
> based on the page/folio's activity. The original idea comes from LRU_GEN
> which provides more precised folio activity than before. This commit try
> to adjust the request's ioprio when certain part of its folios are hot,
> which indicate that this request carry important contents and need be
> scheduled ealier.
>
> This commit is verified on a v6.6 6GB RAM android14 system via 4 test cases
> by changing the bio_add_page/folio API in ext4 and f2fs.
>
> Case 1:
> script[a] which get significant improved fault time as expected[b]
> where dd's cost also shrink from 55s to 40s.
> (1). fault_latency.bin is an ebpf based test tool which measure all task's
>    iowait latency during page fault when scheduled out/in.
> (2). costmem generate page fault by mmaping a file and access the VA.
> (3). dd generate concurrent vfs io.
>
> [a]
> ./fault_latency.bin 1 5 > /data/dd_costmem &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null &
> dd if=/dev/block/sda of=/data/ddtest bs=1024 count=2048000 &
> dd if=/dev/block/sda of=/data/ddtest1 bs=1024 count=2048000 &
> dd if=/dev/block/sda of=/data/ddtest2 bs=1024 count=2048000 &
> dd if=/dev/block/sda of=/data/ddtest3 bs=1024 count=2048000
> [b]
>                        mainline         commit
> io wait                836us            156us
>
> Case 2:
> fio -filename=/dev/block/by-name/userdata -rw=randread -direct=0 -bs=4k -size=2000M -numjobs=8 -group_reporting -name=mytest
> mainline: 513MiB/s
> READ: bw=531MiB/s (557MB/s), 531MiB/s-531MiB/s (557MB/s-557MB/s), io=15.6GiB (16.8GB), run=30137-30137msec
> READ: bw=543MiB/s (569MB/s), 543MiB/s-543MiB/s (569MB/s-569MB/s), io=15.6GiB (16.8GB), run=29469-29469msec
> READ: bw=474MiB/s (497MB/s), 474MiB/s-474MiB/s (497MB/s-497MB/s), io=15.6GiB (16.8GB), run=33724-33724msec
> READ: bw=535MiB/s (561MB/s), 535MiB/s-535MiB/s (561MB/s-561MB/s), io=15.6GiB (16.8GB), run=29928-29928msec
> READ: bw=523MiB/s (548MB/s), 523MiB/s-523MiB/s (548MB/s-548MB/s), io=15.6GiB (16.8GB), run=30617-30617msec
> READ: bw=492MiB/s (516MB/s), 492MiB/s-492MiB/s (516MB/s-516MB/s), io=15.6GiB (16.8GB), run=32518-32518msec
> READ: bw=533MiB/s (559MB/s), 533MiB/s-533MiB/s (559MB/s-559MB/s), io=15.6GiB (16.8GB), run=29993-29993msec
> READ: bw=524MiB/s (550MB/s), 524MiB/s-524MiB/s (550MB/s-550MB/s), io=15.6GiB (16.8GB), run=30526-30526msec
> READ: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=15.6GiB (16.8GB), run=30269-30269msec
> READ: bw=449MiB/s (471MB/s), 449MiB/s-449MiB/s (471MB/s-471MB/s), io=15.6GiB (16.8GB), run=35629-35629msec
>
> commit: 633MiB/s
> READ: bw=668MiB/s (700MB/s), 668MiB/s-668MiB/s (700MB/s-700MB/s), io=15.6GiB (16.8GB), run=23952-23952msec
> READ: bw=589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), io=15.6GiB (16.8GB), run=27164-27164msec
> READ: bw=638MiB/s (669MB/s), 638MiB/s-638MiB/s (669MB/s-669MB/s), io=15.6GiB (16.8GB), run=25071-25071msec
> READ: bw=714MiB/s (749MB/s), 714MiB/s-714MiB/s (749MB/s-749MB/s), io=15.6GiB (16.8GB), run=22409-22409msec
> READ: bw=600MiB/s (629MB/s), 600MiB/s-600MiB/s (629MB/s-629MB/s), io=15.6GiB (16.8GB), run=26669-26669msec
> READ: bw=592MiB/s (621MB/s), 592MiB/s-592MiB/s (621MB/s-621MB/s), io=15.6GiB (16.8GB), run=27036-27036msec
> READ: bw=691MiB/s (725MB/s), 691MiB/s-691MiB/s (725MB/s-725MB/s), io=15.6GiB (16.8GB), run=23150-23150msec
> READ: bw=569MiB/s (596MB/s), 569MiB/s-569MiB/s (596MB/s-596MB/s), io=15.6GiB (16.8GB), run=28142-28142msec
> READ: bw=563MiB/s (590MB/s), 563MiB/s-563MiB/s (590MB/s-590MB/s), io=15.6GiB (16.8GB), run=28429-28429msec
> READ: bw=712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), io=15.6GiB (16.8GB), run=22478-22478msec
>
> Case 3:
> This commit is also verified by the case of launching camera APP which is
> usually considered as heavy working load on both of memory and IO, which
> shows 12%-24% improvement.
>
>                 ttl = 0         ttl = 50        ttl = 100
> mainline        2267ms          2420ms          2316ms
> commit          1992ms          1806ms          1998ms
>
> case 4:
> androbench has no improvment as well as regression which supposed to be
> its test time is short which MGLRU hasn't take effect yet.
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@...soc.com>
> ---
> change of v2: calculate page's activity via helper function
> change of v3: solve layer violation by move API into mm
> change of v4: keep block clean by removing the page related API
> ---
> ---
>  include/linux/act_ioprio.h  | 62 +++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/ioprio.h | 44 +++++++++++++++++++++++---
>  mm/Kconfig                  |  8 +++++
>  3 files changed, 110 insertions(+), 4 deletions(-)
>  create mode 100644 include/linux/act_ioprio.h
>
> diff --git a/include/linux/act_ioprio.h b/include/linux/act_ioprio.h
> new file mode 100644
> index 000000000000..8cfb3df270bd
> --- /dev/null
> +++ b/include/linux/act_ioprio.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _ACT_IOPRIO_H
> +#define _ACT_IOPRIO_H
> +
> +#include <linux/bio.h>
> +
> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO
> +bool BIO_ADD_FOLIO(struct bio *bio, struct folio *folio, size_t len,
> +               size_t off)
> +{
> +       int class, level, hint, activity;
> +
> +       if (len > UINT_MAX || off > UINT_MAX)
> +               return false;
> +
> +       class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> +       level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio);
> +       hint = IOPRIO_PRIO_HINT(bio->bi_ioprio);
> +       activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio);
> +
> +       activity += (bio->bi_vcnt + 1 <= IOPRIO_NR_ACTIVITY &&
> +                       PageWorkingset(&folio->page)) ? 1 : 0;
> +       if (activity >= bio->bi_vcnt / 2)
> +               class = IOPRIO_CLASS_RT;
> +       else if (activity >= bio->bi_vcnt / 4)
> +               class = max(IOPRIO_PRIO_CLASS(get_current_ioprio()), IOPRIO_CLASS_BE);
> +
> +       bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity);
> +
> +       return bio_add_page(bio, &folio->page, len, off) > 0;
> +}
> +
> +int BIO_ADD_PAGE(struct bio *bio, struct page *page,
> +               unsigned int len, unsigned int offset)
> +{
> +       int class, level, hint, activity;
> +
> +       if (bio_add_page(bio, page, len, offset) > 0) {
> +               class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> +               level = IOPRIO_PRIO_LEVEL(bio->bi_ioprio);
> +               hint = IOPRIO_PRIO_HINT(bio->bi_ioprio);
> +               activity = IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio);
> +               activity += (bio->bi_vcnt <= IOPRIO_NR_ACTIVITY && PageWorkingset(page)) ? 1 : 0;
> +               bio->bi_ioprio = IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity);
> +       }
> +
> +       return len;
> +}
> +#else
> +bool BIO_ADD_FOLIO(struct bio *bio, struct folio *folio, size_t len,
> +               size_t off)
> +{
> +       return bio_add_folio(bio, folio, len, off);
> +}
> +
> +int BIO_ADD_PAGE(struct bio *bio, struct page *page,
> +               unsigned int len, unsigned int offset)
> +{
> +       return bio_add_page(bio, page, len, offset);
> +}
> +#endif
> +#endif
> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
> index bee2bdb0eedb..f933af54d71e 100644
> --- a/include/uapi/linux/ioprio.h
> +++ b/include/uapi/linux/ioprio.h
> @@ -71,12 +71,24 @@ enum {
>   * class and level.
>   */
>  #define IOPRIO_HINT_SHIFT              IOPRIO_LEVEL_NR_BITS
> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO
> +#define IOPRIO_HINT_NR_BITS            3
> +#else
>  #define IOPRIO_HINT_NR_BITS            10
> +#endif
>  #define IOPRIO_NR_HINTS                        (1 << IOPRIO_HINT_NR_BITS)
>  #define IOPRIO_HINT_MASK               (IOPRIO_NR_HINTS - 1)
>  #define IOPRIO_PRIO_HINT(ioprio)       \
>         (((ioprio) >> IOPRIO_HINT_SHIFT) & IOPRIO_HINT_MASK)
>
> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO
> +#define IOPRIO_ACTIVITY_SHIFT          (IOPRIO_HINT_NR_BITS + IOPRIO_LEVEL_NR_BITS)
> +#define IOPRIO_ACTIVITY_NR_BITS                7
> +#define IOPRIO_NR_ACTIVITY             (1 << IOPRIO_ACTIVITY_NR_BITS)
> +#define IOPRIO_ACTIVITY_MASK           (IOPRIO_NR_ACTIVITY - 1)
> +#define IOPRIO_PRIO_ACTIVITY(ioprio)   \
> +       (((ioprio) >> IOPRIO_ACTIVITY_SHIFT) & IOPRIO_ACTIVITY_MASK)
> +#endif
>  /*
>   * I/O hints.
>   */
> @@ -104,24 +116,48 @@ enum {
>
>  #define IOPRIO_BAD_VALUE(val, max) ((val) < 0 || (val) >= (max))
>
> +#ifndef CONFIG_CONTENT_ACT_BASED_IOPRIO
>  /*
>   * Return an I/O priority value based on a class, a level and a hint.
>   */
>  static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
> -                                         int priohint)
> +               int priohint)
>  {
>         if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
> -           IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> -           IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
> +                       IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> +                       IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
>                 return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
>
>         return (prioclass << IOPRIO_CLASS_SHIFT) |
>                 (priohint << IOPRIO_HINT_SHIFT) | priolevel;
>  }
> -
>  #define IOPRIO_PRIO_VALUE(prioclass, priolevel)                        \
>         ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE)
>  #define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) \
>         ioprio_value(prioclass, priolevel, priohint)
> +#else
> +/*
> + * Return an I/O priority value based on a class, a level and a hint.
> + */
> +static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
> +               int priohint, int activity)
> +{
> +       if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
> +                       IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> +                       IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS) ||
> +                       IOPRIO_BAD_VALUE(activity, IOPRIO_NR_ACTIVITY))
> +               return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
>
> +       return (prioclass << IOPRIO_CLASS_SHIFT) |
> +               (activity << IOPRIO_ACTIVITY_SHIFT) |
> +               (priohint << IOPRIO_HINT_SHIFT) | priolevel;
> +}
> +
> +#define IOPRIO_PRIO_VALUE(prioclass, priolevel)                        \
> +       ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE, 0)
> +#define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) \
> +       ioprio_value(prioclass, priolevel, priohint, 0)
> +#define IOPRIO_PRIO_VALUE_ACTIVITY(prioclass, priolevel, priohint, activity)   \
> +       ioprio_value(prioclass, priolevel, priohint, activity)
> +#endif
>  #endif /* _UAPI_LINUX_IOPRIO_H */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 264a2df5ecf5..e0e5a5a44ded 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1240,6 +1240,14 @@ config LRU_GEN_STATS
>           from evicted generations for debugging purpose.
>
>           This option has a per-memcg and per-node memory overhead.
> +
> +config CONTENT_ACT_BASED_IOPRIO
> +       bool "Enable content activity based ioprio"
> +       depends on LRU_GEN
> +       default n
> +       help
> +         This item enable the feature of adjust bio's priority by
> +         calculating its content's activity.
>  # }
>
>  config ARCH_SUPPORTS_PER_VMA_LOCK
> --
> 2.25.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ