linux-kernel - Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a memory pressure is detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEjAshpuT_S44Fn12XRZz-aLs38awkJSiQ_J2ofXsJRXKopScQ@mail.gmail.com>
Date:   Fri, 13 Dec 2019 12:47:19 +0100
From:   SeongJae Park <sj38.park@...il.com>
To:     Jürgen Groß <jgross@...e.com>
Cc:     Roger Pau Monné <roger.pau@...rix.com>,
        Jens Axboe <axboe@...nel.dk>,
        SeongJae Park <sjpark@...zon.com>, konrad.wilk@...cle.com,
        pdurrant@...zon.com, SeongJae Park <sjpark@...zon.de>,
        LKML <linux-kernel@...r.kernel.org>, linux-block@...r.kernel.org,
        xen-devel@...ts.xenproject.org
Subject: Re: [Xen-devel] [PATCH v7 2/3] xen/blkback: Squeeze page pools if a
 memory pressure is detected

On Fri, Dec 13, 2019 at 10:33 AM Jürgen Groß <jgross@...e.com> wrote:
>
> On 13.12.19 10:27, Roger Pau Monné wrote:
> > On Thu, Dec 12, 2019 at 05:06:58PM +0100, SeongJae Park wrote:
> >> On Thu, 12 Dec 2019 16:27:57 +0100 "Roger Pau Monné" <roger.pau@...rix.com> wrote:
> >>
> >>>> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> >>>> index fd1e19f1a49f..98823d150905 100644
> >>>> --- a/drivers/block/xen-blkback/blkback.c
> >>>> +++ b/drivers/block/xen-blkback/blkback.c
> >>>> @@ -142,6 +142,21 @@ static inline bool persistent_gnt_timeout(struct persistent_gnt *persistent_gnt)
> >>>>            HZ * xen_blkif_pgrant_timeout);
> >>>>   }
> >>>>
> >>>> +/* Once a memory pressure is detected, squeeze free page pools for a while. */
> >>>> +static unsigned int buffer_squeeze_duration_ms = 10;
> >>>> +module_param_named(buffer_squeeze_duration_ms,
> >>>> +          buffer_squeeze_duration_ms, int, 0644);
> >>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
> >>>> +"Duration in ms to squeeze pages buffer when a memory pressure is detected");
> >>>> +
> >>>> +static unsigned long buffer_squeeze_end;
> >>>> +
> >>>> +void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> >>>> +{
> >>>> +  buffer_squeeze_end = jiffies +
> >>>> +          msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>>
> >>> I'm not sure this is fully correct. This function will be called for
> >>> each blkback instance, but the timeout is stored in a global variable
> >>> that's shared between all blkback instances. Shouldn't this timeout be
> >>> stored in xen_blkif so each instance has it's own local variable?
> >>>
> >>> Or else in the case you have 1k blkback instances the timeout is
> >>> certainly going to be longer than expected, because each call to
> >>> xen_blkbk_reclaim_memory will move it forward.
> >>
> >> Agreed that.  I think the extended timeout would not make a visible
> >> performance, though, because the time that 1k-loop take would be short enough
> >> to be ignored compared to the millisecond-scope duration.
> >>
> >> I took this way because I wanted to minimize such structural changes as far as
> >> I can, as this is just a point-fix rather than ultimate solution.  That said,
> >> it is not fully correct and very confusing.  My another colleague also pointed
> >> out it in internal review.  Correct solution would be to adding a variable in
> >> the struct as you suggested or avoiding duplicated update of the variable by
> >> initializing the variable once the squeezing duration passes.  I would prefer
> >> the later way, as it is more straightforward and still not introducing
> >> structural change.  For example, it might be like below:
> >>
> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
> >> index f41c698dd854..6856c8ef88de 100644
> >> --- a/drivers/block/xen-blkback/blkback.c
> >> +++ b/drivers/block/xen-blkback/blkback.c
> >> @@ -152,8 +152,9 @@ static unsigned long buffer_squeeze_end;
> >>
> >>   void xen_blkbk_reclaim_memory(struct xenbus_device *dev)
> >>   {
> >> -       buffer_squeeze_end = jiffies +
> >> -               msecs_to_jiffies(buffer_squeeze_duration_ms);
> >> +       if (!buffer_squeeze_end)
> >> +               buffer_squeeze_end = jiffies +
> >> +                       msecs_to_jiffies(buffer_squeeze_duration_ms);
> >>   }
> >>
> >>   static inline int get_free_page(struct xen_blkif_ring *ring, struct page **page)
> >> @@ -669,10 +670,13 @@ int xen_blkif_schedule(void *arg)
> >>                  }
> >>
> >>                  /* Shrink the free pages pool if it is too large. */
> >> -               if (time_before(jiffies, buffer_squeeze_end))
> >> +               if (time_before(jiffies, buffer_squeeze_end)) {
> >>                          shrink_free_pagepool(ring, 0);
> >> -               else
> >> +               } else {
> >> +                       if (unlikely(buffer_squeeze_end))
> >> +                               buffer_squeeze_end = 0;
> >>                          shrink_free_pagepool(ring, max_buffer_pages);
> >> +               }
> >>
> >>                  if (log_stats && time_after(jiffies, ring->st_print))
> >>                          print_stats(ring);
> >>
> >> May I ask you what way would you prefer?
> >
> > I'm not particularly found of this approach, as I think it's racy. Ie:
> > you would have to add some kind of lock to make sure the contents of
> > buffer_squeeze_end stay unmodified during the read and set cycle, or
> > else xen_blkif_schedule will race with xen_blkbk_reclaim_memory.
> >
> > This is likely not a big deal ATM since the code will work as
> > expected in most cases AFAICT, but I would still prefer to have a
> > per-instance buffer_squeeze_end added to xen_blkif, given that the
> > callback is per-instance. I wouldn't call it a structural change, it's
> > just adding a variable to a struct instead of having a shared one, but
> > the code is almost the same as the current version.
>
> FWIW, I agree.

Agreed, will send v8 soon!


Thanks,
SeongJae Park

>
>
> Juergen