netdev - Re: [PATCH net-next] net: sched: flower: insert filter to ht before offloading it to hw

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <vbfr2a952te.fsf@mellanox.com>
Date:   Wed, 10 Apr 2019 14:53:53 +0000
From:   Vlad Buslov <vladbu@...lanox.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
CC:     Vlad Buslov <vladbu@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
        "jiri@...nulli.us" <jiri@...nulli.us>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "john.hurley@...ronome.com" <john.hurley@...ronome.com>
Subject: Re: [PATCH net-next] net: sched: flower: insert filter to ht before
 offloading it to hw


On Tue 09 Apr 2019 at 20:10, Jakub Kicinski <jakub.kicinski@...ronome.com> wrote:
> On Tue, 9 Apr 2019 08:23:40 +0000, Vlad Buslov wrote:
>> On Tue 09 Apr 2019 at 01:26, Jakub Kicinski <jakub.kicinski@...ronome.com> wrote:
>> > On Fri,  5 Apr 2019 20:56:26 +0300, Vlad Buslov wrote:
>> >> John reports:
>> >>
>> >> Recent refactoring of fl_change aims to use the classifier spinlock to
>> >> avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer()
>> >> function was moved to before the lock is taken. This can create problems
>> >> for drivers if duplicate filters are created (commmon in ovs tc offload
>> >> due to filters being triggered by user-space matches).
>> >>
>> >> Drivers registered for such filters will now receive multiple copies of
>> >> the same rule, each with a different cookie value. This means that the
>> >> drivers would need to do a full match field lookup to determine
>> >> duplicates, repeating work that will happen in flower __fl_lookup().
>> >> Currently, drivers do not expect to receive duplicate filters.
>> >>
>> >> To fix this, verify that filter with same key is not present in flower
>> >> classifier hash table and insert the new filter to the flower hash table
>> >> before offloading it to hardware. Implement helper function
>> >> fl_ht_insert_unique() to atomically verify/insert a filter.
>> >>
>> >> This change makes filter visible to fast path at the beginning of
>> >> fl_change() function, which means it can no longer be freed directly in
>> >> case of error. Refactor fl_change() error handling code to deallocate the
>> >> filter with rcu timeout.
>> >>
>> >> Fixes: 620da4860827 ("net: sched: flower: refactor fl_change")
>> >> Reported-by: John Hurley <john.hurley@...ronome.com>
>> >> Signed-off-by: Vlad Buslov <vladbu@...lanox.com>
>> >
>> > How is re-offload consistency guaranteed?  IIUC the code is:
>> >
>> >  insert into HT
>> >  offload
>> >  insert into IDR
>> >
>> > What guarantees re-offload consistency if new callback is added just
>> > after offload is requested but before rules ends up in IDR?
>>
>> Hi Jakub,
>>
>> At the moment cls hardware offloads API is always called with rtnl lock,
>> so rule can't be offloaded while reoffload is in progress.
>
> Does that somehow imply atomicity of offloading vs inserting into IDR?
> Doesn't seem so from a cursory look.  Or do you mean rtnl_held is
> always true?

Sorry, I forgot that we are discussing shared block for which rtnl is
not taken in tc_new_tfilter(). Now I understand the issue and will send
my 'reoffload_count' implementation as a fix.

>
>> For my next patch set that unlocks the offloads API I implemented the
>> algorithm to track reoffload count for each tp that works like this:
>>
>> 1. struct tcf_proto is extended with reoffload_count counter that
>>    incremented each time reoffload is called on particular tp instance.
>>    Counter is protected by tp->lock.
>>
>> 2. struct cls_fl_filter is also extended with reoffload_count counter.
>>    Its value is set to current tp->reoffload_count when offloading the
>>    filter.
>>
>> 3. After offloading the filter, but before inserting it to idr,
>>    f->reoffload_count is compared with tp->reoffload_count. If values
>>    don't match, filter is deleted and -EAGAIN is returned. Cls API
>>    retries filter insertion on -EAGAIN.
>
> Sounds good for add.  Does this solve delete case as well?
>
>    CPU 0                       CPU 1
>
> __fl_delete
>   IDR remove
>                            cb unregister
>                              hw delete all flows  <- doesn't see the
>                                                      remove in progress
>
>   hw delete  <- doesn't see
>                 the removed cb

Thanks for pointing that out! Looks like I need to move call to hw
delete in __fl_delete() function to be executed before idr removal.