[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e36a193-8ad9-77e7-e2ff-429fb521a79c@iogearbox.net>
Date: Wed, 4 Sep 2019 17:16:39 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Alexei Starovoitov <ast@...com>,
"nicolas.dichtel@...nd.com" <nicolas.dichtel@...nd.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
"luto@...capital.net" <luto@...capital.net>,
"davem@...emloft.net" <davem@...emloft.net>,
"peterz@...radead.org" <peterz@...radead.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
Kernel Team <Kernel-team@...com>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>
Subject: Re: [PATCH v2 bpf-next 2/3] bpf: implement CAP_BPF
On 9/4/19 3:39 AM, Alexei Starovoitov wrote:
> On 8/30/19 8:19 AM, Nicolas Dichtel wrote:
>> Le 29/08/2019 à 19:30, Alexei Starovoitov a écrit :
>> [snip]
>>> These are the links that showing that k8 can delegates caps.
>>> Are you saying that you know of folks who specifically
>>> delegate cap_sys_admin and cap_net_admin _only_ to a container to run bpf in there?
>>>
>> Yes, we need cap_sys_admin only to load bpf:
>> tc filter add dev eth0 ingress matchall action bpf obj ./tc_test_kern.o sec test
>>
>> I'm not sure to understand why cap_net_admin is not enough to run the previous
>> command (ie why load is forbidden).
>
> because bpf syscall prog_load command requires cap_sys_admin in
> the current implementation.
>
>> I want to avoid sys_admin, thus cap_bpf will be ok. But we need to manage the
>> backward compatibility.
>
> re: backward compatibility...
> do you know of any case where task is running under userid=nobody
> with cap_sys_admin and cap_net_admin in order to do bpf ?
>
> If not then what is the concern about compatibility?
Finally managed to find some cycles to pull up a k8s cluster. Looks like it would
break deployments with the patches as-is right away; meaning, any constellation
where BPF is used inside the pod.
With CAP_BPF patches applied on bpf-next:
# kubectl apply -f ./cilium.yaml
[...]
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system cilium-cz9qs 0/1 CrashLoopBackOff 4 2m36s 192.168.1.125 apoc <none> <none>
kube-system cilium-operator-6c7c6c788b-xcm9d 0/1 Pending 0 2m36s <none> <none> <none> <none>
kube-system coredns-5c98db65d4-6nhpg 0/1 ContainerCreating 0 4m12s <none> apoc <none> <none>
kube-system coredns-5c98db65d4-l5b94 0/1 ContainerCreating 0 4m12s <none> apoc <none> <none>
kube-system etcd-apoc 1/1 Running 0 3m26s 192.168.1.125 apoc <none> <none>
kube-system kube-apiserver-apoc 1/1 Running 0 3m32s 192.168.1.125 apoc <none> <none>
kube-system kube-controller-manager-apoc 1/1 Running 0 3m18s 192.168.1.125 apoc <none> <none>
kube-system kube-proxy-jj9kz 1/1 Running 0 4m12s 192.168.1.125 apoc <none> <none>
kube-system kube-scheduler-apoc 1/1 Running 0 3m26s 192.168.1.125 apoc <none> <none>
# kubectl -n kube-system logs --timestamps cilium-cz9qs
[...]
2019-09-04T14:11:46.399478585Z level=info msg="Cilium 1.6.90 ba0ed147b 2019-09-03T21:20:30+02:00 go version go1.12.8 linux/amd64" subsys=daemon
2019-09-04T14:11:46.410564471Z level=info msg="cilium-envoy version: b7a919ebdca3d3bbc6aae51357e78e9c603450ae/1.11.1/Modified/RELEASE/BoringSSL" subsys=daemon
2019-09-04T14:11:46.446983926Z level=info msg="clang (7.0.0) and kernel (5.3.0) versions: OK!" subsys=daemon
[...]
2019-09-04T14:11:47.27988188Z level=info msg="Mounting BPF filesystem at /run/cilium/bpffs" subsys=bpf
2019-09-04T14:11:47.279904256Z level=info msg="Detected mounted BPF filesystem at /run/cilium/bpffs" subsys=bpf
2019-09-04T14:11:47.280205098Z level=info msg="Valid label prefix configuration:" subsys=labels-filter
2019-09-04T14:11:47.280214528Z level=info msg=" - :io.kubernetes.pod.namespace" subsys=labels-filter
2019-09-04T14:11:47.28021738Z level=info msg=" - :io.cilium.k8s.namespace.labels" subsys=labels-filter
2019-09-04T14:11:47.280220836Z level=info msg=" - :app.kubernetes.io" subsys=labels-filter
2019-09-04T14:11:47.280223355Z level=info msg=" - !:io.kubernetes" subsys=labels-filter
2019-09-04T14:11:47.280225723Z level=info msg=" - !:kubernetes.io" subsys=labels-filter
2019-09-04T14:11:47.280228095Z level=info msg=" - !:.*beta.kubernetes.io" subsys=labels-filter
2019-09-04T14:11:47.280230409Z level=info msg=" - !:k8s.io" subsys=labels-filter
2019-09-04T14:11:47.280232699Z level=info msg=" - !:pod-template-generation" subsys=labels-filter
2019-09-04T14:11:47.280235569Z level=info msg=" - !:pod-template-hash" subsys=labels-filter
2019-09-04T14:11:47.28023792Z level=info msg=" - !:controller-revision-hash" subsys=labels-filter
2019-09-04T14:11:47.280240253Z level=info msg=" - !:annotation.*" subsys=labels-filter
2019-09-04T14:11:47.280242566Z level=info msg=" - !:etcd_node" subsys=labels-filter
2019-09-04T14:11:47.28026585Z level=info msg="Initializing daemon" subsys=daemon
2019-09-04T14:11:47.281344002Z level=info msg="Detected MTU 1500" subsys=mtu
2019-09-04T14:11:47.281771889Z level=error msg="Error while opening/creating BPF maps" error="Unable to create map /run/cilium/bpffs/tc/globals/cilium_lxc: operation not permitted" subsys=daemon
2019-09-04T14:11:47.28178666Z level=fatal msg="Error while creating daemon" error="Unable to create map /run/cilium/bpffs/tc/globals/cilium_lxc: operation not permitted" subsys=daemon
And /same/ deployment with reverted patches, hence no CAP_BPF gets it up and running again:
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system cilium-cz9qs 1/1 Running 13 50m 192.168.1.125 apoc <none> <none>
kube-system cilium-operator-6c7c6c788b-xcm9d 0/1 Pending 0 50m <none> <none> <none> <none>
kube-system coredns-5c98db65d4-6nhpg 1/1 Running 0 52m 10.217.0.91 apoc <none> <none>
kube-system coredns-5c98db65d4-l5b94 1/1 Running 0 52m 10.217.0.225 apoc <none> <none>
kube-system etcd-apoc 1/1 Running 1 51m 192.168.1.125 apoc <none> <none>
kube-system kube-apiserver-apoc 1/1 Running 1 51m 192.168.1.125 apoc <none> <none>
kube-system kube-controller-manager-apoc 1/1 Running 1 51m 192.168.1.125 apoc <none> <none>
kube-system kube-proxy-jj9kz 1/1 Running 1 52m 192.168.1.125 apoc <none> <none>
kube-system kube-scheduler-apoc 1/1 Running 1 51m 192.168.1.125 apoc <none> <none>
Thanks,
Daniel
Powered by blists - more mailing lists