9.1 Security Observability Requires Policy and Context

a security tool needs to be able to distinguish between events that are expected under normal circumstances and events that suggest malicious activity might be taking place.

Policies have to take into account not just normal behavior when systems are fully functional, but also the expected error path behavior.

Defining what is and isn’t expected behavior is the job of a policy.

The more contextual information that’s available to the investigator, the more likely they will be able to find out the root cause of the event and determine whether it was an attack, which components were affected, how and when the attack took place, and who was responsible.

9.2 Using System Calls for Security Events

System calls (or syscalls) are the interface between user space applications and the kernel.

9.2.1 Seccomp

SECure COMPuting

seccomp is used to limit the set of syscalls a process can use to a very small subset: read(), write(), _exit(), and sigreturn().

seccomp-bpf: Instead of having a fixed subset of syscalls that it permits, this mode of seccomp uses BPF code to filter the syscalls that are and aren’t allowed.

The outcome is one of a set of possible actions that include:

  • Allowing the syscall to go ahead
  • Returning an error code to the user space application
  • Killing the thread
  • Notifying a user space application (seccomp-unotify) (as of kernel version 5.0)

Generating Seccomp Proiles

In the early days, seccomp profiles were generally compiled using strace to gather the set of syscalls an application calls.

There are a couple of tools that do this, using eBPF to gather information about all the syscalls being called:

  • Inspektor Gadget includes a seccomp profiler that allows you to generate a custom seccomp profile for the containers in a Kubernetes pod.
  • Red Hat created a seccomp profiler in the form of an OCI runtime hook.

Syscall-Tracking Security Tools

the CNCF project Falco, which provides security alerts

Users can define rules to determine what events are security relevant, and Falco can generate alerts in a variety of formats when events happen that don’t match the policies defined in these rules.

1
2
3
BPF_PROBE("raw_syscalls/", sys_enter, sys_enter_args)

BPF_PROBE("raw_syscalls/", sys_exit, sys_exit_args)

Since eBPF programs can be loaded dynamically and can detect events triggered by preexisting processes, tools like Falco can apply policies to application workloads that are already running.

Unfortunately there is a problem with this approach of using syscall entry points for security tooling: there is a Time Of Check to Time Of Use (TOCTOU) issue.

When an eBPF program is triggered at the entry point to a system call, it can access the arguments that user space has passed to that system call. If those arguments are pointers, the kernel will need to copy the pointed-to data into its own data structures before acting on that data.

there is a window of opportunity for an attacker to modify this data, after it has been inspected by the eBPF program but before the kernel copies it.

The Sysmon for Linux tool addresses the TOCTOU window by attaching to both the entry and exit points for syscalls.

if the syscall returns a file descriptor, the eBPF program attached to the exit can retrieve correct information about the object that the file descriptor represents by looking into the related process’s file descriptor table.

9.3 BPF LSM

LSM 接口提供了一组钩子函数,这些钩子函数均会在内核即将对内核数据结构执行操作之前触发。钩子调用的函数可决定是否允许该操作继续执行。该接口最初是为了支持以内核模块的形式实现安全工具而设计的;而 BPF LSM 对其进行了扩展,使得 eBPF 程序也能挂载到这些相同的钩子点上。

内核源代码中存在数百个 LSM 钩子。需要明确的是,系统调用与 LSM 钩子之间并非一一对应的关系,但如果某个系统调用从安全角度来看可能会执行一些需要关注的操作,那么对该系统调用的处理就会触发一个或多个 LSM 钩子。

示例:当使用 chmod 命令时被调用

1
2
3
4
5
6
SEC("lsm/path_chmod")
int BPF_PROG(path_chmod, const struct path *path, umode_t mode)
{
bpf_printk("Change mode of file name %s\n", path->dentry->d_iname);
return 0;
}

返回非零值会拒绝执行此修改的权限,因此内核不会继续进行该操作。值得注意的是,像这样完全在内核内部进行策略检查的方式,具备极高的性能。

NOTE:

LSM BPF 在内核版本 5.7 中被加入。

9.4 Cilium Tetragon

Tetragon is part of the Cilium project.

Tetragon’s approach is to build a framework for attaching eBPF programs to arbitrary functions in the Linux kernel.

Tetragon is designed for use in a Kubernetes environment, and the project defines a custom Kubernetes resource type called a TracingPolicy.

1
2
3
4
5
6
7
8
9
10
spec:
kprobes:
- call: "fd_install"
...
matchArgs:
- index: 1
operator: "Prefix"
values:
- "/etc/"
...

9.4.1 Attaching to Internal Kernel Functions

The “fd” stands for “file descriptor,” and the comment in the source code for this function tells us this function “Install[s] a file pointer in the fd array.”

This happens when a file is opened, and it’s called after the file’s data structure has been populated in the kernel.

9.4.2 Preventative Security

In kernel versions 5.3 and up, there is a BPF helper function called bpf_send_signal(). Tetragon uses this function to implement preventative security. If a policy defines a Sigkill action, any matching events will cause Tetragon eBPF code to generate a SIGKILL signal that terminates the process that was attempting the out-of-policy action.

Sigkill policies need to be used with care, because an incorrectly configured policy could result in terminating applications unnecessarily, but it’s an incredibly powerful use of eBPF for security purposes.

9.5 Network Security

Network security tools are very often used in a preventative mode, dropping packets rather than just auditing malicious activity.

  • Firewalling and DDoS protection are a natural fit for eBPF programs attached early in the ingress path for network packets. And with the possibility of XDP programs offloaded to hardware, malicious packets may never even reach the CPU!

  • For implementing more sophisticated network policies, such as Kubernetes policies determining which services are allowed to communicate with one another, eBPF programs that attach to points in the network stack can drop packets if they are determined to be out of policy.

9.6 Summary

In this chapter you saw how eBPF’s use in security has evolved from low-level checks on system calls to much more sophisticated use of eBPF programs for security policy checks, in-kernel event filtering, and runtime enforcement.