twx-linux/kernel
Alexei Starovoitov 969bf05eb3 bpf: direct packet access
Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.

Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
  struct iphdr *ip;

  if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
      /* packet too small */
      return 0;

  ip = skb->data + ETH_HLEN;

  /* access IP header fields with direct loads */
  if (ip->version != 4 || ip->saddr == 0x7f000001)
      return 1;
  [...]
}

This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.

For XDP the direct store instructions can be allowed as well.

The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access

[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 16:01:53 -04:00
..
bpf bpf: direct packet access 2016-05-06 16:01:53 -04:00
configs
debug mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings 2016-02-22 08:51:37 +01:00
events Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-04 00:52:29 -04:00
gcov gcov: use within_module() helper. 2015-12-04 22:46:25 +01:00
irq genirq: Dont allow affinity mask to be updated on IPIs 2016-04-21 12:05:15 +02:00
livepatch livepatch/module: remove livepatch module notifier 2016-03-17 09:45:10 +01:00
locking lockdep: Fix lock_chain::base size 2016-04-23 13:53:03 +02:00
power Power management and ACPI material for v4.6-rc1, part 2 2016-03-24 22:59:58 -07:00
printk printk: add clear_idx symbol to vmcoreinfo 2016-03-17 15:09:34 -07:00
rcu kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
sched locking/atomic, sched: Unexport fetch_or() 2016-03-29 11:52:11 +02:00
time timers/nohz: Convert tick dependency mask to atomic_t 2016-03-29 11:52:11 +02:00
trace bpf: add event output helper for notifications/sampling/logging 2016-04-19 20:26:11 -04:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c
async.c async: export current_is_async() 2015-11-19 17:51:48 +01:00
audit_fsnotify.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c Merge branch 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit 2016-03-19 17:52:49 -07:00
audit.c Merge branch 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit 2016-03-19 17:52:49 -07:00
audit.h security: Make inode argument of inode_getsecid non-const 2015-12-24 11:09:39 -05:00
auditfilter.c audit: Fix typo in comment 2016-02-08 11:25:39 -05:00
auditsc.c auditsc: for seccomp events, log syscall compat state using in_compat_syscall 2016-03-22 15:36:02 -07:00
backtracetest.c
bounds.c
capability.c
cgroup_freezer.c cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends 2015-12-03 10:24:08 -05:00
cgroup_pids.c cgroup_pids: fix a typo. 2015-12-14 14:54:37 -05:00
cgroup.c cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback 2016-04-25 15:45:14 -04:00
compat.c
configs.c
context_tracking.c context_tracking: Switch to new static_branch API 2015-11-24 09:56:43 +01:00
cpu_pm.c
cpu.c cpu/hotplug: Fix rollback during error-out in __cpu_disable() 2016-04-22 09:49:49 +02:00
cpuset.c cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback 2016-04-25 15:45:14 -04:00
crash_dump.c
cred.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
delayacct.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
dma.c
elfcore.c
exec_domain.c
exit.c oom: clear TIF_MEMDIE after oom_reaper managed to unmap the address space 2016-03-25 16:37:42 -07:00
extable.c
fork.c kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
freezer.c
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
futex.c futex: Acknowledge a new waiter in counter before plist 2016-04-21 11:06:09 +02:00
groups.c
hung_task.c kernel/hung_task.c: use timeout diff when timeout is updated 2016-03-22 15:36:02 -07:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
kallsyms.c kallsyms: add support for relative offsets in kallsyms address table 2016-03-15 16:55:16 -07:00
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: don't profile branches in kcov 2016-04-28 19:34:04 -07:00
kexec_core.c kexec: export OFFSET(page.compound_head) to find out compound tail page 2016-04-28 19:34:04 -07:00
kexec_file.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2016-03-17 11:33:45 -07:00
kexec_internal.h kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILE 2016-01-20 17:09:18 -08:00
kexec.c kexec: set KEXEC_TYPE_CRASH before sanity_check_segment_list() 2016-01-20 17:09:18 -08:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c
ksysfs.c rcu: Remove TINY_RCU bloat from pointless boot parameters 2015-12-07 16:59:37 -08:00
kthread.c
latencytop.c sched/debug: Make schedstats a runtime tunable that is disabled by default 2016-02-09 11:54:23 +01:00
Makefile kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
membarrier.c
memremap.c memremap: add MEMREMAP_WC flag 2016-03-22 15:36:02 -07:00
module_signing.c X.509: Make algo identifiers text instead of enum 2016-03-03 21:49:27 +00:00
module-internal.h
module.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2016-03-17 21:46:32 -07:00
notifier.c
nsproxy.c cgroup: introduce cgroup namespaces 2016-02-16 13:04:58 -05:00
padata.c
panic.c panic: change nmi_panic from macro to function 2016-03-22 15:36:02 -07:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid_namespace.c
pid.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-31 15:44:04 -08:00
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2016-03-22 15:36:02 -07:00
ptrace.c ptrace: change __ptrace_unlink() to clear ->ptrace under ->siglock 2016-03-22 15:36:02 -07:00
range.c
reboot.c
relay.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2016-04-14 12:56:09 -07:00
seccomp.c seccomp: check in_compat_syscall, not is_compat_task, in strict mode 2016-03-22 15:36:02 -07:00
signal.c kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE 2016-03-22 15:36:02 -07:00
smp.c Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-03-15 13:50:29 -07:00
smpboot.c cpu/hotplug: Unpark smpboot threads from the state machine 2016-03-01 20:36:56 +01:00
smpboot.h cpu/hotplug: Create hotplug threads 2016-03-01 20:36:56 +01:00
softirq.c arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections 2016-03-25 16:37:42 -07:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: remove CONFIG_SMP dependencies 2016-01-16 11:17:24 -08:00
sys_ni.c vfs: add copy_file_range syscall and vfs helper 2015-12-01 14:00:53 -05:00
sys.c timer: convert timer_slack_ns from unsigned long to u64 2016-03-17 15:09:34 -07:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-03-22 15:36:02 -07:00
sysctl.c mm: scale kswapd watermarks in proportion to memory 2016-03-17 15:09:34 -07:00
task_work.c
taskstats.c taskstats: use the libnl API to align nlattr on 64-bit 2016-04-23 20:13:25 -04:00
test_kprobes.c
torture.c
tracepoint.c kernel/...: convert pr_warning to pr_warn 2016-03-22 15:36:02 -07:00
tsacct.c time, acct: Drop irq save & restore from __acct_update_integrals() 2016-02-29 09:53:09 +01:00
uid16.c
up.c
user_namespace.c kernel/*: switch to memdup_user_nul() 2016-01-04 10:27:55 -05:00
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog.c watchdog: don't run proc_watchdog_update if new value is same as old 2016-03-17 15:09:34 -07:00
workqueue_internal.h sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() 2016-03-02 10:28:47 -05:00
workqueue.c Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-04-27 12:03:59 -07:00