lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c170029-e847-49cf-8332-dc1388fc5c86@bytedance.com>
Date:   Wed, 6 Dec 2023 16:37:38 +0800
From:   Qi Zheng <zhengqi.arch@...edance.com>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     akpm@...ux-foundation.org, paulmck@...nel.org, david@...morbit.com,
        tkhai@...ru, vbabka@...e.cz, roman.gushchin@...ux.dev,
        djwong@...nel.org, brauner@...nel.org, tytso@....edu,
        steven.price@....com, cel@...nel.org, senozhatsky@...omium.org,
        yujie.liu@...el.com, gregkh@...uxfoundation.org,
        muchun.song@...ux.dev, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v6 42/45] mm: shrinker: make global slab shrink lockless

Hi,

On 2023/12/6 16:23, Lai Jiangshan wrote:
> On Wed, Dec 6, 2023 at 3:55 PM Qi Zheng <zhengqi.arch@...edance.com> wrote:
>>
>> Hi,
>>
>> On 2023/12/6 15:47, Lai Jiangshan wrote:
>>> On Tue, Sep 12, 2023 at 9:57 PM Qi Zheng <zhengqi.arch@...edance.com> wrote:
>>>
>>>> -       if (!down_read_trylock(&shrinker_rwsem))
>>>> -               goto out;
>>>> -
>>>> -       list_for_each_entry(shrinker, &shrinker_list, list) {
>>>> +       /*
>>>> +        * lockless algorithm of global shrink.
>>>> +        *
>>>> +        * In the unregistration setp, the shrinker will be freed asynchronously
>>>> +        * via RCU after its refcount reaches 0. So both rcu_read_lock() and
>>>> +        * shrinker_try_get() can be used to ensure the existence of the shrinker.
>>>> +        *
>>>> +        * So in the global shrink:
>>>> +        *  step 1: use rcu_read_lock() to guarantee existence of the shrinker
>>>> +        *          and the validity of the shrinker_list walk.
>>>> +        *  step 2: use shrinker_try_get() to try get the refcount, if successful,
>>>> +        *          then the existence of the shrinker can also be guaranteed,
>>>> +        *          so we can release the RCU lock to do do_shrink_slab() that
>>>> +        *          may sleep.
>>>> +        *  step 3: *MUST* to reacquire the RCU lock before calling shrinker_put(),
>>>> +        *          which ensures that neither this shrinker nor the next shrinker
>>>> +        *          will be freed in the next traversal operation.
>>>
>>> Hello, Qi, Andrew, Paul,
>>>
>>> I wonder know how RCU can ensure the lifespan of the next shrinker.
>>> it seems it is diverged from the common pattern usage of RCU+reference.
>>>
>>> cpu1:
>>> rcu_read_lock();
>>> shrinker_try_get(this_shrinker);
>>> rcu_read_unlock();
>>>       cpu2: shrinker_free(this_shrinker);
>>>       cpu2: shrinker_free(next_shrinker); and free the memory of next_shrinker
>>>       cpu2: when shrinker_free(next_shrinker), no one updates this_shrinker's next
>>>       cpu2: since this_shrinker has been removed first.
>>
>> No, this_shrinker will not be removed from the shrinker_list until the
>> last refcount is released. See below:
>>
>>> rcu_read_lock();
>>> shrinker_put(this_shrinker);
>>
>>          CPU 1                                      CPU 2
>>
>>     --> if (refcount_dec_and_test(&shrinker->refcount))
>>                  complete(&shrinker->done);
>>
>>                                  wait_for_completion(&shrinker->done);
>>                                   list_del_rcu(&shrinker->list);
> 
> since shrinker will not be removed from the shrinker_list until the
> last refcount is released.
> 
> Is it possible that shrinker_free() can be starved by continuous
> scanners getting and putting the refcount?

I actually considered this case, but the probability of this
happening was low, so I discarded the relevant code (v2 --> v3).
If this problem really occurs in a production environment, we
can fix it, like below:

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 1a00be90d93a..e5ebbbf1414f 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -88,6 +88,7 @@ struct shrinker {
         long batch;     /* reclaim batch size, 0 = default */
         int seeks;      /* seeks to recreate an obj */
         unsigned flags;
+       bool registered;

         /*
          * The reference count of this shrinker. Registered shrinker 
have an
@@ -138,7 +139,8 @@ void shrinker_free(struct shrinker *shrinker);

  static inline bool shrinker_try_get(struct shrinker *shrinker)
  {
-       return refcount_inc_not_zero(&shrinker->refcount);
+       return READ_ONCE(shrinker->registered) &&
+              refcount_inc_not_zero(&shrinker->refcount);
  }

  static inline void shrinker_put(struct shrinker *shrinker)
diff --git a/mm/shrinker.c b/mm/shrinker.c
index dd91eab43ed3..9b8881d178c6 100644
--- a/mm/shrinker.c
+++ b/mm/shrinker.c
@@ -753,6 +753,7 @@ void shrinker_register(struct shrinker *shrinker)
          * shrinker_try_get().
          */
         refcount_set(&shrinker->refcount, 1);
+       WRITE_ONCE(shrinker->registered, true);
  }
  EXPORT_SYMBOL_GPL(shrinker_register);

@@ -773,6 +774,7 @@ void shrinker_free(struct shrinker *shrinker)
                 return;

         if (shrinker->flags & SHRINKER_REGISTERED) {
+               WRITE_ONCE(shrinker->registered, false);
                 /* drop the initial refcount */
                 shrinker_put(shrinker);
                 /*

Thanks,
Qi

> 
> Thanks
> Lai
> 
> 
>>
>>> travel to the freed next_shrinker.
>>>
>>> a quick simple fix:
>>>
>>> // called with other references other than RCU (i.e. refcount)
>>> static inline rcu_list_deleted(struct list_head *entry)
>>> {
>>>      // something like this:
>>>      return entry->prev == LIST_POISON2;
>>> }
>>>
>>> // in the loop
>>> if (rcu_list_deleted(&shrinker->list)) {
>>>      shrinker_put(shrinker);
>>>      goto restart;
>>> }
>>> rcu_read_lock();
>>> shrinker_put(shrinker);
>>>
>>> Thanks
>>> Lai
>>>
>>>> +        *  step 4: do shrinker_put() paired with step 2 to put the refcount,
>>>> +        *          if the refcount reaches 0, then wake up the waiter in
>>>> +        *          shrinker_free() by calling complete().
>>>> +        */
>>>> +       rcu_read_lock();
>>>> +       list_for_each_entry_rcu(shrinker, &shrinker_list, list) {
>>>>                   struct shrink_control sc = {
>>>>                           .gfp_mask = gfp_mask,
>>>>                           .nid = nid,
>>>>                           .memcg = memcg,
>>>>                   };
>>>>
>>>> +               if (!shrinker_try_get(shrinker))
>>>> +                       continue;
>>>> +
>>>> +               rcu_read_unlock();
>>>> +
>>>>                   ret = do_shrink_slab(&sc, shrinker, priority);
>>>>                   if (ret == SHRINK_EMPTY)
>>>>                           ret = 0;
>>>>                   freed += ret;
>>>> -               /*
>>>> -                * Bail out if someone want to register a new shrinker to
>>>> -                * prevent the registration from being stalled for long periods
>>>> -                * by parallel ongoing shrinking.
>>>> -                */
>>>> -               if (rwsem_is_contended(&shrinker_rwsem)) {
>>>> -                       freed = freed ? : 1;
>>>> -                       break;
>>>> -               }
>>>> +
>>>> +               rcu_read_lock();
>>>> +               shrinker_put(shrinker);
>>>>           }
>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ