lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 28 Jul 2023 09:29:13 +0800
From:   Ian Kent <raven@...maw.net>
To:     Imran Khan <imran.f.khan@...cle.com>,
        Anders Roxell <anders.roxell@...aro.org>
Cc:     Arnd Bergmann <arnd@...db.de>, Tejun Heo <tj@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Minchan Kim <minchan@...nel.org>,
        Eric Sandeen <sandeen@...deen.net>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Rick Lindsley <ricklind@...ux.vnet.ibm.com>,
        David Howells <dhowells@...hat.com>,
        Miklos Szeredi <miklos@...redi.hu>,
        Carlos Maiolino <cmaiolino@...hat.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Kernel Mailing List <linux-kernel@...r.kernel.org>,
        elver@...gle.com
Subject: Re: [PATCH 1/2] kernfs: dont take i_lock on inode attr read

On 28/7/23 09:06, Imran Khan wrote:
> Hello Ian,
>
> On 28/7/2023 10:16 am, Ian Kent wrote:
>> On 28/7/23 08:00, Ian Kent wrote:
>>> On 27/7/23 12:30, Imran Khan wrote:
>>>> Hello Ian,
>>>> Sorry for late reply. I was about to reply this week.
>>>>
>>>> On 27/7/2023 10:38 am, Ian Kent wrote:
>>>>> On 20/7/23 10:03, Ian Kent wrote:
>>>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote:
>>>> [...]
>>>>>> I do see a problem with recent changes.
>>>>>>
>>>>>> I'll send this off to Greg after I've done some testing (primarily just
>>>>>> compile and function).
>>>>>>
>>>>>> Here's a patch which describes what I found.
>>>>>>
>>>>>> Comments are, of course, welcome, ;)
>>>>> Anders I was hoping you would check if/what lockdep trace
>>>>>
>>>>> you get with this patch.
>>>>>
>>>>>
>>>>> Imran, I was hoping you would comment on my change as it
>>>>>
>>>>> relates to the kernfs_iattr_rwsem changes.
>>>>>
>>>>>
>>>>> Ian
>>>>>
>>>>>> kernfs: fix missing kernfs_iattr_rwsem locking
>>>>>>
>>>>>> From: Ian Kent <raven@...maw.net>
>>>>>>
>>>>>> When the kernfs_iattr_rwsem was introduced a case was missed.
>>>>>>
>>>>>> The update of the kernfs directory node child count was also protected
>>>>>> by the kernfs_rwsem and needs to be included in the change so that the
>>>>>> child count (and so the inode n_link attribute) does not change while
>>>>>> holding the rwsem for read.
>>>>>>
>>>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and
>>>> these are getting invoked while adding (kernfs_add_one),
>>>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these
>>>> operations proceed under kernfs_rwsem and I see each invocation of
>>>> kernfs_link/unlink_sibling during the above mentioned operations is happening
>>>> under kernfs_rwsem.
>>>> So the child count should still be protected by kernfs_rwsem and we should not
>>>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling.
>>> Yes, that's exactly what I intended (assuming you mean write lock in those cases)
>>>
>>> when I did it so now I wonder what I saw that lead to my patch, I'll need to look
>>>
>>> again ...
>> Ahh, I see why I thought this ...
>>
>> It's the hunk:
>>
>> @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap,
>>          kn = inode->i_private;
>>          root = kernfs_root(kn);
>>
>> -       down_read(&root->kernfs_rwsem);
>> +       down_read(&root->kernfs_iattr_rwsem);
>>          kernfs_refresh_inode(kn, inode);
>>          ret = generic_permission(&nop_mnt_idmap, inode, mask);
>> -       up_read(&root->kernfs_rwsem);
>> +       up_read(&root->kernfs_iattr_rwsem);
>>
>>          return ret;
>>   }
>>
>> which takes away the kernfs_rwsem and introduces the possibility of
>>
>> the change. It may be more instructive to add back taking the read
>>
>> lock of kernfs_rwsem in .permission() than altering the sibling link
>>
>> and unlink functions, I mean I even caught myself on it.
>>
> Yes this was the block I referred to in my second comment [1]. However adding
> back read lock of kernfs_rwsem in .permission() will again introduce the
> bottleneck that I mentioned at [2]. So I think taking taking the locks in
> link/unlink is the better option.

Yes, sorry, I always fall into the trap of not reading through all my

mail before commenting, oops!


The fact that .permission() is called so much more than the create/remove

functions also occurred to be me too just after I posted my comment (and

is probably why I originally did it that way).


I'll forward the patch for merge but would really like to see the lockdep

trace so I'll wait a little while ...

> I understand having one lock to synchronize everything makes it easier
> debug/development wise but sometimes (such as the case mentioned in [2]), it is
> not optimum performance wise.

Indeed, the performance improvement work that has been happening over

quite some time now is very good.


I had seen some opportunities around the file open path long ago but

hadn't got to doing anything there as you have, the work looks good

to me, thanks for doing it.


Ian

> Thoughts ?
>
> Thanks,
> Imran
>
> [1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@oracle.com/
> [2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@oracle.com/
>> Ian
>>
>>>
>>>> Kindly let me know your thoughts. I would still like to see new lockdep traces
>>>> with this change.
>>> Indeed, I hope Anders can find time to get the trace.
>>>
>>>
>>> Ian
>>>
>>>> Thanks,
>>>> Imran
>>>>
>>>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode
>>>>>> attributes)
>>>>>>
>>>>>> Signed-off-by: Ian Kent <raven@...maw.net>
>>>>>> Cc: Anders Roxell <anders.roxell@...aro.org>
>>>>>> Cc: Imran Khan <imran.f.khan@...cle.com>
>>>>>> Cc: Arnd Bergmann <arnd@...db.de>
>>>>>> Cc: Minchan Kim <minchan@...nel.org>
>>>>>> Cc: Eric Sandeen <sandeen@...deen.net>
>>>>>> ---
>>>>>>     fs/kernfs/dir.c |    4 ++++
>>>>>>     1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
>>>>>> index 45b6919903e6..6e84bb69602e 100644
>>>>>> --- a/fs/kernfs/dir.c
>>>>>> +++ b/fs/kernfs/dir.c
>>>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node
>>>>>> *kn)
>>>>>>         rb_insert_color(&kn->rb, &kn->parent->dir.children);
>>>>>>           /* successfully added, account subdir number */
>>>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>         if (kernfs_type(kn) == KERNFS_DIR)
>>>>>>             kn->parent->dir.subdirs++;
>>>>>>         kernfs_inc_rev(kn->parent);
>>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>           return 0;
>>>>>>     }
>>>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct
>>>>>> kernfs_node *kn)
>>>>>>         if (RB_EMPTY_NODE(&kn->rb))
>>>>>>             return false;
>>>>>>     + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>         if (kernfs_type(kn) == KERNFS_DIR)
>>>>>>             kn->parent->dir.subdirs--;
>>>>>>         kernfs_inc_rev(kn->parent);
>>>>>> +    up_write(&kernfs_root(kn)->kernfs_iattr_rwsem);
>>>>>>           rb_erase(&kn->rb, &kn->parent->dir.children);
>>>>>>         RB_CLEAR_NODE(&kn->rb);
>>>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ