[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 9 Sep 2019 12:07:01 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: virtio-dev@...ts.oasis-open.org, kvm@...r.kernel.org,
mst@...hat.com, catalin.marinas@....com, david@...hat.com,
dave.hansen@...el.com, linux-kernel@...r.kernel.org,
willy@...radead.org, mhocko@...nel.org, linux-mm@...ck.org,
akpm@...ux-foundation.org, will@...nel.org,
linux-arm-kernel@...ts.infradead.org, osalvador@...e.de,
yang.zhang.wz@...il.com, pagupta@...hat.com,
konrad.wilk@...cle.com, nitesh@...hat.com, riel@...riel.com,
lcapitulino@...hat.com, wei.w.wang@...el.com, aarcange@...hat.com,
ying.huang@...el.com, pbonzini@...hat.com,
dan.j.williams@...el.com, fengguang.wu@...el.com,
alexander.h.duyck@...ux.intel.com, kirill.shutemov@...ux.intel.com
Subject: Re: [PATCH v9 1/8] mm: Add per-cpu logic to page shuffling
On Sat, Sep 07, 2019 at 10:25:12AM -0700, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
>
> Change the logic used to generate randomness in the suffle path so that we
Typo.
> can avoid cache line bouncing. The previous logic was sharing the offset
> and entropy word between all CPUs. As such this can result in cache line
> bouncing and will ultimately hurt performance when enabled.
>
> To resolve this I have moved to a per-cpu logic for maintaining a unsigned
> long containing some amount of bits, and an offset value for which bit we
> can use for entropy with each call.
>
> Reviewed-by: Dan Williams <dan.j.williams@...el.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@...ux.intel.com>
> ---
> mm/shuffle.c | 33 +++++++++++++++++++++++----------
> 1 file changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index 3ce12481b1dc..9ba542ecf335 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -183,25 +183,38 @@ void __meminit __shuffle_free_memory(pg_data_t *pgdat)
> shuffle_zone(z);
> }
>
> +struct batched_bit_entropy {
> + unsigned long entropy_bool;
> + int position;
> +};
> +
> +static DEFINE_PER_CPU(struct batched_bit_entropy, batched_entropy_bool);
> +
> void add_to_free_area_random(struct page *page, struct free_area *area,
> int migratetype)
> {
> - static u64 rand;
> - static u8 rand_bits;
> + struct batched_bit_entropy *batch;
> + unsigned long entropy;
> + int position;
>
> /*
> - * The lack of locking is deliberate. If 2 threads race to
> - * update the rand state it just adds to the entropy.
> + * We shouldn't need to disable IRQs as the only caller is
> + * __free_one_page and it should only be called with the zone lock
> + * held and either from IRQ context or with local IRQs disabled.
> */
> - if (rand_bits == 0) {
> - rand_bits = 64;
> - rand = get_random_u64();
> + batch = raw_cpu_ptr(&batched_entropy_bool);
> + position = batch->position;
> +
> + if (--position < 0) {
> + batch->entropy_bool = get_random_long();
> + position = BITS_PER_LONG - 1;
> }
>
> - if (rand & 1)
> + batch->position = position;
> + entropy = batch->entropy_bool;
> +
> + if (1ul & (entropy >> position))
Maybe something like this would be more readble:
if (entropy & BIT(position))
> add_to_free_area(page, area, migratetype);
> else
> add_to_free_area_tail(page, area, migratetype);
> - rand_bits--;
> - rand >>= 1;
> }
>
>
--
Kirill A. Shutemov
Powered by blists - more mailing lists