lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Mar 2019 17:52:59 +0100
From:   Uladzislau Rezki <>
To:     Andrew Morton <>
Cc:     "Uladzislau Rezki (Sony)" <>,
        Michal Hocko <>,
        Matthew Wilcox <>,,
        LKML <>,
        Thomas Garnier <>,
        Oleksiy Avramchenko <>,
        Steven Rostedt <>,
        Joel Fernandes <>,
        Thomas Gleixner <>,
        Ingo Molnar <>, Tejun Heo <>
Subject: Re: [RFC PATCH v2 0/1] improve vmap allocation

On Thu, Mar 21, 2019 at 03:01:06PM -0700, Andrew Morton wrote:
> On Thu, 21 Mar 2019 20:03:26 +0100 "Uladzislau Rezki (Sony)" <> wrote:
> > Hello.
> > 
> > This is the v2 of the rework. Instead of
> > referring you to that link, i will go through it again describing the improved
> > allocation method and provide changes between v1 and v2 in the end.
> > 
> > ...
> >
> > Performance analysis
> > --------------------
> Impressive numbers.  But this is presumably a worst-case microbenchmark.
> Are you able to describe the benefits which are observed in some
> real-world workload which someone cares about?
We work with Android. Google uses its own tool called UiBench to measure
performance of UI. It counts dropped or delayed frames, or as they call it,
jank. Basically if we deliver 59(should be 60) frames per second then we
get 1 junk/drop.

I see that on our devices avg-jank is lower. In our case Android graphics
pipeline uses vmalloc allocations which can lead to delays of UI content
to GPU. But such behavior depends on your platform, parts of the system
which make use of it and if they are critical to time or not.

Second example is indirect impact. During analysis of audio glitches
in high-resolution audio the source of drops were long alloc_vmap_area()

# Explanation is here

# Audio 10 seconds sample is here.
# The drop occurs at 00:09.295 you can hear it

> It's a lot of new code. I t looks decent and I'll toss it in there for
> further testing.  Hopefully someone will be able to find the time for a
> detailed review.
Thank you :)

> Trivial point: the code uses "inline" a lot.  Nowadays gcc cheerfully
> ignores that and does its own thing.  You might want to look at the
> effects of simply deleting all that.  Is the generated code better or
> worse or the same?  If something really needs to be inlined then use
> __always_inline, preferably with a comment explaining why it is there.
When the main core functionalities are "inlined" i see the benefit. 
At least, it is noticeable by the "test driver". But i agree that
i should check one more time to see what can be excluded and used
as a regular call. Thanks for the hint, it is worth to go with
__always_inline instead.

Vlad Rezki

Powered by blists - more mailing lists