v5.0
971 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
3612af783c |
bpf: fix sanitation rewrite in case of non-pointers
Marek reported that he saw an issue with the below snippet in that
timing measurements where off when loaded as unpriv while results
were reasonable when loaded as privileged:
[...]
uint64_t a = bpf_ktime_get_ns();
uint64_t b = bpf_ktime_get_ns();
uint64_t delta = b - a;
if ((int64_t)delta > 0) {
[...]
Turns out there is a bug where a corner case is missing in the fix
|
||
|
|
352d20d611 |
bpf: drop refcount if bpf_map_new_fd() fails in map_create()
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.
Fixes:
|
||
|
|
781e62823c |
bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id()
In bpf/syscall.c, bpf_map_get_fd_by_id() use bpf_map_inc_not_zero()
to increase the refcount, both map->refcnt and map->usercnt. Then, if
bpf_map_new_fd() fails, should handle map->usercnt too.
Fixes:
|
||
|
|
7c0cdf0b39 |
bpf, lpm: fix lookup bug in map_delete_elem
trie_delete_elem() was deleting an entry even though it was not matching
if the prefixlen was correct. This patch adds a check on matchlen.
Reproducer:
$ sudo bpftool map create /sys/fs/bpf/mylpm type lpm_trie key 8 value 1 entries 128 name mylpm flags 1
$ sudo bpftool map update pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 aa bb cc dd value hex 01
$ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
key: 10 00 00 00 aa bb cc dd value: 01
Found 1 element
$ sudo bpftool map delete pinned /sys/fs/bpf/mylpm key hex 10 00 00 00 ff ff ff ff
$ echo $?
0
$ sudo bpftool map dump pinned /sys/fs/bpf/mylpm
Found 0 elements
A similar reproducer is added in the selftests.
Without the patch:
$ sudo ./tools/testing/selftests/bpf/test_lpm_map
test_lpm_map: test_lpm_map.c:485: test_lpm_delete: Assertion `bpf_map_delete_elem(map_fd, key) == -1 && errno == ENOENT' failed.
Aborted
With the patch: test_lpm_map runs without errors.
Fixes:
|
||
|
|
3defaf2f15 |
bpf: fix lockdep false positive in stackmap
Lockdep warns about false positive:
[ 11.211460] ------------[ cut here ]------------
[ 11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
[ 11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
[ 11.213134] Modules linked in:
[ 11.214954] RIP: 0010:lock_release+0x1ad/0x280
[ 11.223508] Call Trace:
[ 11.223705] <IRQ>
[ 11.223874] ? __local_bh_enable+0x7a/0x80
[ 11.224199] up_read+0x1c/0xa0
[ 11.224446] do_up_read+0x12/0x20
[ 11.224713] irq_work_run_list+0x43/0x70
[ 11.225030] irq_work_run+0x26/0x50
[ 11.225310] smp_irq_work_interrupt+0x57/0x1f0
[ 11.225662] irq_work_interrupt+0xf/0x20
since rw_semaphore is released in a different task vs task that locked the sema.
It is expected behavior.
Fix the warning with up_read_non_owner() and rwsem_release() annotation.
Fixes:
|
||
|
|
d623876646 |
bpf: Fix narrow load on a bpf_sock returned from sk_lookup()
By adding this test to test_verifier:
{
"reference tracking: access sk->src_ip4 (narrow load)",
.insns = {
BPF_SK_LOOKUP,
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
BPF_LDX_MEM(BPF_H, BPF_REG_2, BPF_REG_0, offsetof(struct bpf_sock, src_ip4) + 2),
BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
BPF_EMIT_CALL(BPF_FUNC_sk_release),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.result = ACCEPT,
},
The above test loads 2 bytes from sk->src_ip4 where
sk is obtained by bpf_sk_lookup_tcp().
It hits an internal verifier error from convert_ctx_accesses():
[root@arch-fb-vm1 bpf]# ./test_verifier 665 665
Failed to load prog 'Invalid argument'!
0: (b7) r2 = 0
1: (63) *(u32 *)(r10 -8) = r2
2: (7b) *(u64 *)(r10 -16) = r2
3: (7b) *(u64 *)(r10 -24) = r2
4: (7b) *(u64 *)(r10 -32) = r2
5: (7b) *(u64 *)(r10 -40) = r2
6: (7b) *(u64 *)(r10 -48) = r2
7: (bf) r2 = r10
8: (07) r2 += -48
9: (b7) r3 = 36
10: (b7) r4 = 0
11: (b7) r5 = 0
12: (85) call bpf_sk_lookup_tcp#84
13: (bf) r6 = r0
14: (15) if r0 == 0x0 goto pc+3
R0=sock(id=1,off=0,imm=0) R6=sock(id=1,off=0,imm=0) R10=fp0,call_-1 fp-8=????0000 fp-16=0000mmmm fp-24=mmmmmmmm fp-32=mmmmmmmm fp-40=mmmmmmmm fp-48=mmmmmmmm refs=1
15: (69) r2 = *(u16 *)(r0 +26)
16: (bf) r1 = r6
17: (85) call bpf_sk_release#86
18: (95) exit
from 14 to 18: safe
processed 20 insns (limit 131072), stack depth 48
bpf verifier is misconfigured
Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
The bpf_sock_is_valid_access() is expecting src_ip4 can be narrowly
loaded (meaning load any 1 or 2 bytes of the src_ip4) by
marking info->ctx_field_size. However, this marked
ctx_field_size is not used. This patch fixes it.
Due to the recent refactoring in test_verifier,
this new test will be added to the bpf-next branch
(together with the bpf_tcp_sock patchset)
to avoid merge conflict.
Fixes:
|
||
|
|
7c4cd051ad |
bpf: Fix syscall's stackmap lookup potential deadlock
The map_lookup_elem used to not acquiring spinlock in order to optimize the reader. It was true until commit |
||
|
|
a89fac57b5 |
bpf: fix lockdep false positive in percpu_freelist
Lockdep warns about false positive:
[ 12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[ 12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[ 12.493275] (&rq->lock){-.-.}
[ 12.493276]
[ 12.493276]
[ 12.493276] and interrupts could create inverse lock ordering between them.
[ 12.493276]
[ 12.494435]
[ 12.494435] other info that might help us debug this:
[ 12.494979] Possible interrupt unsafe locking scenario:
[ 12.494979]
[ 12.495518] CPU0 CPU1
[ 12.495879] ---- ----
[ 12.496243] lock(&head->lock);
[ 12.496502] local_irq_disable();
[ 12.496969] lock(&rq->lock);
[ 12.497431] lock(&head->lock);
[ 12.497890] <Interrupt>
[ 12.498104] lock(&rq->lock);
[ 12.498368]
[ 12.498368] *** DEADLOCK ***
[ 12.498368]
[ 12.498837] 1 lock held by dd/276:
[ 12.499110] #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[ 12.499747]
[ 12.499747] the shortest dependencies between 2nd lock and 1st lock:
[ 12.500389] -> (&rq->lock){-.-.} {
[ 12.500669] IN-HARDIRQ-W at:
[ 12.500934] _raw_spin_lock+0x2f/0x40
[ 12.501373] scheduler_tick+0x4c/0xf0
[ 12.501812] update_process_times+0x40/0x50
[ 12.502294] tick_periodic+0x27/0xb0
[ 12.502723] tick_handle_periodic+0x1f/0x60
[ 12.503203] timer_interrupt+0x11/0x20
[ 12.503651] __handle_irq_event_percpu+0x43/0x2c0
[ 12.504167] handle_irq_event_percpu+0x20/0x50
[ 12.504674] handle_irq_event+0x37/0x60
[ 12.505139] handle_level_irq+0xa7/0x120
[ 12.505601] handle_irq+0xa1/0x150
[ 12.506018] do_IRQ+0x77/0x140
[ 12.506411] ret_from_intr+0x0/0x1d
[ 12.506834] _raw_spin_unlock_irqrestore+0x53/0x60
[ 12.507362] __setup_irq+0x481/0x730
[ 12.507789] setup_irq+0x49/0x80
[ 12.508195] hpet_time_init+0x21/0x32
[ 12.508644] x86_late_time_init+0xb/0x16
[ 12.509106] start_kernel+0x390/0x42a
[ 12.509554] secondary_startup_64+0xa4/0xb0
[ 12.510034] IN-SOFTIRQ-W at:
[ 12.510305] _raw_spin_lock+0x2f/0x40
[ 12.510772] try_to_wake_up+0x1c7/0x4e0
[ 12.511220] swake_up_locked+0x20/0x40
[ 12.511657] swake_up_one+0x1a/0x30
[ 12.512070] rcu_process_callbacks+0xc5/0x650
[ 12.512553] __do_softirq+0xe6/0x47b
[ 12.512978] irq_exit+0xc3/0xd0
[ 12.513372] smp_apic_timer_interrupt+0xa9/0x250
[ 12.513876] apic_timer_interrupt+0xf/0x20
[ 12.514343] default_idle+0x1c/0x170
[ 12.514765] do_idle+0x199/0x240
[ 12.515159] cpu_startup_entry+0x19/0x20
[ 12.515614] start_kernel+0x422/0x42a
[ 12.516045] secondary_startup_64+0xa4/0xb0
[ 12.516521] INITIAL USE at:
[ 12.516774] _raw_spin_lock_irqsave+0x38/0x50
[ 12.517258] rq_attach_root+0x16/0xd0
[ 12.517685] sched_init+0x2f2/0x3eb
[ 12.518096] start_kernel+0x1fb/0x42a
[ 12.518525] secondary_startup_64+0xa4/0xb0
[ 12.518986] }
[ 12.519132] ... key at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
[ 12.519649] ... acquired at:
[ 12.519892] pcpu_freelist_pop+0x7b/0xd0
[ 12.520221] bpf_get_stackid+0x1d2/0x4d0
[ 12.520563] ___bpf_prog_run+0x8b4/0x11a0
[ 12.520887]
[ 12.521008] -> (&head->lock){+...} {
[ 12.521292] HARDIRQ-ON-W at:
[ 12.521539] _raw_spin_lock+0x2f/0x40
[ 12.521950] pcpu_freelist_push+0x2a/0x40
[ 12.522396] bpf_get_stackid+0x494/0x4d0
[ 12.522828] ___bpf_prog_run+0x8b4/0x11a0
[ 12.523296] INITIAL USE at:
[ 12.523537] _raw_spin_lock+0x2f/0x40
[ 12.523944] pcpu_freelist_populate+0xc0/0x120
[ 12.524417] htab_map_alloc+0x405/0x500
[ 12.524835] __do_sys_bpf+0x1a3/0x1a90
[ 12.525253] do_syscall_64+0x4a/0x180
[ 12.525659] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 12.526167] }
[ 12.526311] ... key at: [<ffffffff838f7668>] __key.13130+0x0/0x8
[ 12.526812] ... acquired at:
[ 12.527047] __lock_acquire+0x521/0x1350
[ 12.527371] lock_acquire+0x98/0x190
[ 12.527680] _raw_spin_lock+0x2f/0x40
[ 12.527994] pcpu_freelist_push+0x2a/0x40
[ 12.528325] bpf_get_stackid+0x494/0x4d0
[ 12.528645] ___bpf_prog_run+0x8b4/0x11a0
[ 12.528970]
[ 12.529092]
[ 12.529092] stack backtrace:
[ 12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[ 12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 12.530750] Call Trace:
[ 12.530948] dump_stack+0x5f/0x8b
[ 12.531248] check_usage_backwards+0x10c/0x120
[ 12.531598] ? ___bpf_prog_run+0x8b4/0x11a0
[ 12.531935] ? mark_lock+0x382/0x560
[ 12.532229] mark_lock+0x382/0x560
[ 12.532496] ? print_shortest_lock_dependencies+0x180/0x180
[ 12.532928] __lock_acquire+0x521/0x1350
[ 12.533271] ? find_get_entry+0x17f/0x2e0
[ 12.533586] ? find_get_entry+0x19c/0x2e0
[ 12.533902] ? lock_acquire+0x98/0x190
[ 12.534196] lock_acquire+0x98/0x190
[ 12.534482] ? pcpu_freelist_push+0x2a/0x40
[ 12.534810] _raw_spin_lock+0x2f/0x40
[ 12.535099] ? pcpu_freelist_push+0x2a/0x40
[ 12.535432] pcpu_freelist_push+0x2a/0x40
[ 12.535750] bpf_get_stackid+0x494/0x4d0
[ 12.536062] ___bpf_prog_run+0x8b4/0x11a0
It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
has already been running (ensured by the percpu bpf_prog_active counter).
Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.
Fix this by calling local_irq_save/restore for stackmap.
Another false positive had also been worked around by calling
local_irq_save in commit
|
||
|
|
6cab5e90ab |
bpf: run bpf programs with preemption disabled
Disabled preemption is necessary for proper access to per-cpu maps from BPF programs. But the sender side of socket filters didn't have preemption disabled: unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN and a combination of af_packet with tun device didn't disable either: tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue-> tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN Disable preemption before executing BPF programs (both classic and extended). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
81f5c6f5db |
bpf: btf: allow typedef func_proto
Current implementation does not allow typedef func_proto.
But it is actually allowed.
-bash-4.4$ cat t.c
typedef int (f) (int);
f *g;
-bash-4.4$ clang -O2 -g -c -target bpf t.c -Xclang -target-feature -Xclang +dwarfris
-bash-4.4$ pahole -JV t.o
File t.o:
[1] PTR (anon) type_id=2
[2] TYPEDEF f type_id=3
[3] FUNC_PROTO (anon) return=4 args=(4 (anon))
[4] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
-bash-4.4$
This patch related btf verifier to allow such (typedef func_proto)
patterns.
Fixes:
|
||
|
|
9d5564ddcf |
bpf: fix inner map masking to prevent oob under speculation
During review I noticed that inner meta map setup for map in
map is buggy in that it does not propagate all needed data
from the reference map which the verifier is later accessing.
In particular one such case is index masking to prevent out of
bounds access under speculative execution due to missing the
map's unpriv_array/index_mask field propagation. Fix this such
that the verifier is generating the correct code for inlined
lookups in case of unpriviledged use.
Before patch (test_verifier's 'map in map access' dump):
# bpftool prog dump xla id 3
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:4]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking for
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+11
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) | Index masking missing (!)
22: (35) if r0 >= 0x1 goto pc+3 | for inner map despite
23: (67) r0 <<= 3 | map->unpriv_array set.
24: (0f) r0 += r1 |
25: (05) goto pc+1 |
26: (b7) r0 = 0 |
27: (b7) r0 = 0
28: (95) exit
After patch:
# bpftool prog dump xla id 1
0: (62) *(u32 *)(r10 -4) = 0
1: (bf) r2 = r10
2: (07) r2 += -4
3: (18) r1 = map[id:2]
5: (07) r1 += 272 |
6: (61) r0 = *(u32 *)(r2 +0) |
7: (35) if r0 >= 0x1 goto pc+6 | Same inlined map in map lookup
8: (54) (u32) r0 &= (u32) 0 | with index masking due to
9: (67) r0 <<= 3 | map->unpriv_array.
10: (0f) r0 += r1 |
11: (79) r0 = *(u64 *)(r0 +0) |
12: (15) if r0 == 0x0 goto pc+1 |
13: (05) goto pc+1 |
14: (b7) r0 = 0 |
15: (15) if r0 == 0x0 goto pc+12
16: (62) *(u32 *)(r10 -4) = 0
17: (bf) r2 = r10
18: (07) r2 += -4
19: (bf) r1 = r0
20: (07) r1 += 272 |
21: (61) r0 = *(u32 *)(r2 +0) |
22: (35) if r0 >= 0x1 goto pc+4 | Now fixed inlined inner map
23: (54) (u32) r0 &= (u32) 0 | lookup with proper index masking
24: (67) r0 <<= 3 | for map->unpriv_array.
25: (0f) r0 += r1 |
26: (05) goto pc+1 |
27: (b7) r0 = 0 |
28: (b7) r0 = 0
29: (95) exit
Fixes:
|
||
|
|
c8dc79806e |
bpf: Annotate implicit fall through in cgroup_dev_func_proto
There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warnings (W=1). This commit removes the following warning: kernel/bpf/cgroup.c:719:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
583c531853 |
bpf: Make function btf_name_offset_valid static
Initially in commit |
||
|
|
4af396ae48 |
bpf: zero out build_id for BPF_STACK_BUILD_ID_IP
When returning BPF_STACK_BUILD_ID_IP from stack_map_get_build_id_offset,
make sure that build_id field is empty. Since we are using percpu
free list, there is a possibility that we might reuse some previous
bpf_stack_build_id with non-zero build_id.
Fixes:
|
||
|
|
0b698005a9 |
bpf: don't assume build-id length is always 20 bytes
Build-id length is not fixed to 20, it can be (`man ld` /--build-id):
* 128-bit (uuid)
* 160-bit (sha1)
* any length specified in ld --build-id=0xhexstring
To fix the issue of missing BPF_STACK_BUILD_ID_VALID for shorter build-ids,
assume that build-id is somewhere in the range of 1 .. 20.
Set the remaining bytes to zero.
v2:
* don't introduce new "len = min(BPF_BUILD_ID_SIZE, nhdr->n_descsz)",
we already know that nhdr->n_descsz <= BPF_BUILD_ID_SIZE if we enter
this 'if' condition
Fixes:
|
||
|
|
17e3ac8125 |
bpf: fix bpffs bitfield pretty print
Commit |
||
|
|
beaf3d1901 |
bpf: fix panic in stack_map_get_build_id() on i386 and arm32
As Naresh reported, test_stacktrace_build_id() causes panic on i386 and
arm32 systems. This is caused by page_address() returns NULL in certain
cases.
This patch fixes this error by using kmap_atomic/kunmap_atomic instead
of page_address.
Fixes:
|
||
|
|
d3bd7413e0 |
bpf: fix sanitation of alu op with pointer / scalar type from different paths
While |
||
|
|
96d4f267e4 |
Remove 'type' argument from access_ok() function
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
||
|
|
979d63d50c |
bpf: prevent out of bounds speculation on pointer arithmetic
Jann reported that the original commit back in |
||
|
|
b7137c4eab |
bpf: fix check_map_access smin_value test when pointer contains offset
In check_map_access() we probe actual bounds through __check_map_access() with offset of reg->smin_value + off for lower bound and offset of reg->umax_value + off for the upper bound. However, even though the reg->smin_value could have a negative value, the final result of the sum with off could be positive when pointer arithmetic with known and unknown scalars is combined. In this case we reject the program with an error such as "R<x> min value is negative, either use unsigned index or do a if (index >=0) check." even though the access itself would be fine. Therefore extend the check to probe whether the actual resulting reg->smin_value + off is less than zero. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
9d7eceede7 |
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
For unknown scalars of mixed signed bounds, meaning their smin_value is negative and their smax_value is positive, we need to reject arithmetic with pointer to map value. For unprivileged the goal is to mask every map pointer arithmetic and this cannot reliably be done when it is unknown at verification time whether the scalar value is negative or positive. Given this is a corner case, the likelihood of breaking should be very small. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
e4298d2583 |
bpf: restrict stack pointer arithmetic for unprivileged
Restrict stack pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a stack pointer as a destination we simulate a check_stack_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. This is analog to map value pointer arithmetic and needed for masking later on. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
0d6303db79 |
bpf: restrict map value pointer arithmetic for unprivileged
Restrict map value pointer arithmetic for unprivileged users in that arithmetic itself must not go out of bounds as opposed to the actual access later on. Therefore after each adjust_ptr_min_max_vals() with a map value pointer as a destination it will simulate a check_map_access() of 1 byte on the destination and once that fails the program is rejected for unprivileged program loads. We use this later on for masking any pointer arithmetic with the remainder of the map value space. The likelihood of breaking any existing real-world unprivileged eBPF program is very small for this corner case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
9b73bfdd08 |
bpf: enable access to ax register also from verifier rewrite
Right now we are using BPF ax register in JIT for constant blinding as well as in interpreter as temporary variable. Verifier will not be able to use it simply because its use will get overridden from the former in bpf_jit_blind_insn(). However, it can be made to work in that blinding will be skipped if there is prior use in either source or destination register on the instruction. Taking constraints of ax into account, the verifier is then open to use it in rewrites under some constraints. Note, ax register already has mappings in every eBPF JIT. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
144cd91c4c |
bpf: move tmp variable into ax register in interpreter
This change moves the on-stack 64 bit tmp variable in ___bpf_prog_run() into the hidden ax register. The latter is currently only used in JITs for constant blinding as a temporary scratch register, meaning the BPF interpreter will never see the use of ax. Therefore it is safe to use it for the cases where tmp has been used earlier. This is needed to later on allow restricted hidden use of ax in both interpreter and JITs. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
c08435ec7f |
bpf: move {prev_,}insn_idx into verifier env
Move prev_insn_idx and insn_idx from the do_check() function into the verifier environment, so they can be read inside the various helper functions for handling the instructions. It's easier to put this into the environment rather than changing all call-sites only to pass it along. insn_idx is useful in particular since this later on allows to hold state in env->insn_aux_data[env->insn_idx]. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
339bbff2d6 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-12-21
The following pull-request contains BPF updates for your *net-next* tree.
There is a merge conflict in test_verifier.c. Result looks as follows:
[...]
},
{
"calls: cross frame pruning",
.insns = {
[...]
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "function calls to other bpf functions are allowed for root only",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
{
"jset: functional",
.insns = {
[...]
{
"jset: unknown const compare not taken",
.insns = {
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32),
BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
BPF_EXIT_INSN(),
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.errstr_unpriv = "!read_ok",
.result_unpriv = REJECT,
.errstr = "!read_ok",
.result = REJECT,
},
[...]
{
"jset: range",
.insns = {
[...]
},
.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
.result_unpriv = ACCEPT,
.result = ACCEPT,
},
The main changes are:
1) Various BTF related improvements in order to get line info
working. Meaning, verifier will now annotate the corresponding
BPF C code to the error log, from Martin and Yonghong.
2) Implement support for raw BPF tracepoints in modules, from Matt.
3) Add several improvements to verifier state logic, namely speeding
up stacksafe check, optimizations for stack state equivalence
test and safety checks for liveness analysis, from Alexei.
4) Teach verifier to make use of BPF_JSET instruction, add several
test cases to kselftests and remove nfp specific JSET optimization
now that verifier has awareness, from Jakub.
5) Improve BPF verifier's slot_type marking logic in order to
allow more stack slot sharing, from Jiong.
6) Add sk_msg->size member for context access and add set of fixes
and improvements to make sock_map with kTLS usable with openssl
based applications, from John.
7) Several cleanups and documentation updates in bpftool as well as
auto-mount of tracefs for "bpftool prog tracelog" command,
from Quentin.
8) Include sub-program tags from now on in bpf_prog_info in order to
have a reliable way for user space to get all tags of the program
e.g. needed for kallsyms correlation, from Song.
9) Add BTF annotations for cgroup_local_storage BPF maps and
implement bpf fs pretty print support, from Roman.
10) Fix bpftool in order to allow for cross-compilation, from Ivan.
11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
to be compatible with libbfd and allow for Debian packaging,
from Jakub.
12) Remove an obsolete prog->aux sanitation in dump and get rid of
version check for prog load, from Daniel.
13) Fix a memory leak in libbpf's line info handling, from Prashant.
14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
does not get unaligned, from Jesper.
15) Fix test_progs kselftest to work with older compilers which are less
smart in optimizing (and thus throwing build error), from Stanislav.
16) Cleanup and simplify AF_XDP socket teardown, from Björn.
17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
to netns_id argument, from Andrey.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
||
|
|
77ea5f4cbe |
bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't
The frame_size passed to build_skb must be aligned, else it is
possible that the embedded struct skb_shared_info gets unaligned.
For correctness make sure that xdpf->headroom in included in the
alignment. No upstream drivers can hit this, as all XDP drivers provide
an aligned headroom. This was discovered when playing with implementing
XDP support for mvneta, which have a 2 bytes DSA header, and this
Marvell ARM64 platform didn't like doing atomic operations on an
unaligned skb_shinfo(skb)->dataref addresses.
Fixes:
|
||
|
|
2be09de7d6 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
9b38c4056b |
bpf: verifier: reorder stack size check with dead code sanitization
Reorder the calls to check_max_stack_depth() and sanitize_dead_code() to separate functions which can rewrite instructions from pure checks. No functional changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
960ea05656 |
bpf: verifier: teach the verifier to reason about the BPF_JSET instruction
Some JITs (nfp) try to optimize code on their own. It could make sense in case of BPF_JSET instruction which is currently not interpreted by the verifier, meaning for instance that dead could would not be detected if it was under BPF_JSET branch. Teach the verifier basics of BPF_JSET, JIT optimizations will be removed shortly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
fdbaa0beb7 |
bpf: Ensure line_info.insn_off cannot point to insn with zero code
This patch rejects a line_info if the bpf insn code referred by line_info.insn_off is 0. F.e. a broken userspace tool might generate a line_info.insn_off that points to the second 8 bytes of a BPF_LD_IMM64. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
76c43ae84e |
bpf: log struct/union attribute for forward type
Current btf internal verbose logger logs fwd type as
[2] FWD A type_id=0
where A is the type name.
Commit
|
||
|
|
0bae2d4d62 |
bpf: correct slot_type marking logic to allow more stack slot sharing
Verifier is supposed to support sharing stack slot allocated to ptr with SCALAR_VALUE for privileged program. However this doesn't happen for some cases. The reason is verifier is not clearing slot_type STACK_SPILL for all bytes, it only clears part of them, while verifier is using: slot_type[0] == STACK_SPILL as a convention to check one slot is ptr type. So, the consequence of partial clearing slot_type is verifier could treat a partially overridden ptr slot, which should now be a SCALAR_VALUE slot, still as ptr slot, and rejects some valid programs. Before this patch, test_xdp_noinline.o under bpf selftests, bpf_lxc.o and bpf_netdev.o under Cilium bpf repo, when built with -mattr=+alu32 are rejected due to this issue. After this patch, they all accepted. There is no processed insn number change before and after this patch on Cilium bpf programs. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
a38d1107f9 |
bpf: support raw tracepoints in modules
Distributions build drivers as modules, including network and filesystem drivers which export numerous tracepoints. This enables bpf(BPF_RAW_TRACEPOINT_OPEN) to attach to those tracepoints. Signed-off-by: Matt Mullins <mmullins@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
ffa0c1cf59 |
bpf: enable cgroup local storage map pretty print with kind_flag
Commit 970289fc0a83 ("bpf: add bpffs pretty print for cgroup
local storage maps") added bpffs pretty print for cgroup
local storage maps. The commit worked for struct without kind_flag
set.
This patch refactored and made pretty print also work
with kind_flag set for the struct.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
9d5f9f701b |
bpf: btf: fix struct/union/fwd types with kind_flag
This patch fixed two issues with BTF. One is related to
struct/union bitfield encoding and the other is related to
forward type.
Issue #1 and solution:
======================
Current btf encoding of bitfield follows what pahole generates.
For each bitfield, pahole will duplicate the type chain and
put the bitfield size at the final int or enum type.
Since the BTF enum type cannot encode bit size,
pahole workarounds the issue by generating
an int type whenever the enum bit size is not 32.
For example,
-bash-4.4$ cat t.c
typedef int ___int;
enum A { A1, A2, A3 };
struct t {
int a[5];
___int b:4;
volatile enum A c:4;
} g;
-bash-4.4$ gcc -c -O2 -g t.c
The current kernel supports the following BTF encoding:
$ pahole -JV t.o
[1] TYPEDEF ___int type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] ENUM A size=4 vlen=3
A1 val=0
A2 val=1
A3 val=2
[4] STRUCT t size=24 vlen=3
a type_id=5 bits_offset=0
b type_id=9 bits_offset=160
c type_id=11 bits_offset=164
[5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
[6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
[7] VOLATILE (anon) type_id=3
[8] INT int size=1 bit_offset=0 nr_bits=4 encoding=(none)
[9] TYPEDEF ___int type_id=8
[10] INT (anon) size=1 bit_offset=0 nr_bits=4 encoding=SIGNED
[11] VOLATILE (anon) type_id=10
Two issues are in the above:
. by changing enum type to int, we lost the original
type information and this will not be ideal later
when we try to convert BTF to a header file.
. the type duplication for bitfields will cause
BTF bloat. Duplicated types cannot be deduplicated
later if the bitfield size is different.
To fix this issue, this patch implemented a compatible
change for BTF struct type encoding:
. the bit 31 of struct_type->info, previously reserved,
now is used to indicate whether bitfield_size is
encoded in btf_member or not.
. if bit 31 of struct_type->info is set,
btf_member->offset will encode like:
bit 0 - 23: bit offset
bit 24 - 31: bitfield size
if bit 31 is not set, the old behavior is preserved:
bit 0 - 31: bit offset
So if the struct contains a bit field, the maximum bit offset
will be reduced to (2^24 - 1) instead of MAX_UINT. The maximum
bitfield size will be 256 which is enough for today as maximum
bitfield in compiler can be 128 where int128 type is supported.
This kernel patch intends to support the new BTF encoding:
$ pahole -JV t.o
[1] TYPEDEF ___int type_id=2
[2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[3] ENUM A size=4 vlen=3
A1 val=0
A2 val=1
A3 val=2
[4] STRUCT t kind_flag=1 size=24 vlen=3
a type_id=5 bitfield_size=0 bits_offset=0
b type_id=1 bitfield_size=4 bits_offset=160
c type_id=7 bitfield_size=4 bits_offset=164
[5] ARRAY (anon) type_id=2 index_type_id=2 nr_elems=5
[6] INT sizetype size=8 bit_offset=0 nr_bits=64 encoding=(none)
[7] VOLATILE (anon) type_id=3
Issue #2 and solution:
======================
Current forward type in BTF does not specify whether the original
type is struct or union. This will not work for type pretty print
and BTF-to-header-file conversion as struct/union must be specified.
$ cat tt.c
struct t;
union u;
int foo(struct t *t, union u *u) { return 0; }
$ gcc -c -g -O2 tt.c
$ pahole -JV tt.o
[1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[2] FWD t type_id=0
[3] PTR (anon) type_id=2
[4] FWD u type_id=0
[5] PTR (anon) type_id=4
To fix this issue, similar to issue #1, type->info bit 31
is used. If the bit is set, it is union type. Otherwise, it is
a struct type.
$ pahole -JV tt.o
[1] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
[2] FWD t kind_flag=0 type_id=0
[3] PTR (anon) kind_flag=0 type_id=2
[4] FWD u kind_flag=1 type_id=0
[5] PTR (anon) kind_flag=0 type_id=4
Pahole/LLVM change:
===================
The new kind_flag functionality has been implemented in pahole
and llvm:
https://github.com/yonghong-song/pahole/tree/bitfield
https://github.com/yonghong-song/llvm/tree/bitfield
Note that pahole hasn't implemented func/func_proto kind
and .BTF.ext. So to print function signature with bpftool,
the llvm compiler should be used.
Fixes:
|
||
|
|
f97be3ab04 |
bpf: btf: refactor btf_int_bits_seq_show()
Refactor function btf_int_bits_seq_show() by creating function btf_bitfield_seq_show() which has no dependence on btf and btf_type. The function btf_bitfield_seq_show() will be in later patch to directly dump bitfield member values. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
6c4fc209fc |
bpf: remove useless version check for prog load
Existing libraries and tracing frameworks work around this kernel version check by automatically deriving the kernel version from uname(3) or similar such that the user does not need to do it manually; these workarounds also make the version check useless at the same time. Moreover, most other BPF tracing types enabling bpf_probe_read()-like functionality have /not/ adapted this check, and in general these days it is well understood anyway that all the tracing programs are not stable with regards to future kernels as kernel internal data structures are subject to change from release to release. Back at last netconf we discussed [0] and agreed to remove this check from bpf_prog_load() and instead document it here in the uapi header that there is no such guarantee for stable API for these programs. [0] http://vger.kernel.org/netconf2018_files/DanielBorkmann_netconf2018.pdf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
9242b5f561 |
bpf: add self-check logic to liveness analysis
Introduce REG_LIVE_DONE to check the liveness propagation and prepare the states for merging. See algorithm description in clean_live_states(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
19e2dbb7dd |
bpf: improve stacksafe state comparison
"if (old->allocated_stack > cur->allocated_stack)" check is too conservative.
In some cases explored stack could have allocated more space,
but that stack space was not live.
The test case improves from 19 to 15 processed insns
and improvement on real programs is significant as well:
before after
bpf_lb-DLB_L3.o 1940 1831
bpf_lb-DLB_L4.o 3089 3029
bpf_lb-DUNKNOWN.o 1065 1064
bpf_lxc-DDROP_ALL.o 28052 26309
bpf_lxc-DUNKNOWN.o 35487 33517
bpf_netdev.o 10864 9713
bpf_overlay.o 6643 6184
bpf_lcx_jit.o 38437 37335
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
||
|
|
b233920c97 |
bpf: speed up stacksafe check
Don't check the same stack liveness condition 8 times. once is enough. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
d9762e84ed |
bpf: verbose log bpf_line_info in verifier
This patch adds bpf_line_info during the verifier's verbose.
It can give error context for debug purpose.
~~~~~~~~~~
Here is the verbose log for backedge:
while (a) {
a += bpf_get_smp_processor_id();
bpf_trace_printk(fmt, sizeof(fmt), a);
}
~> bpftool prog load ./test_loop.o /sys/fs/bpf/test_loop type tracepoint
13: while (a) {
3: a += bpf_get_smp_processor_id();
back-edge from insn 13 to 3
~~~~~~~~~~
Here is the verbose log for invalid pkt access:
Modification to test_xdp_noinline.c:
data = (void *)(long)xdp->data;
data_end = (void *)(long)xdp->data_end;
/*
if (data + 4 > data_end)
return XDP_DROP;
*/
*(u32 *)data = dst->dst;
~> bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp
; data = (void *)(long)xdp->data;
224: (79) r2 = *(u64 *)(r10 -112)
225: (61) r2 = *(u32 *)(r2 +0)
; *(u32 *)data = dst->dst;
226: (63) *(u32 *)(r2 +0) = r1
invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0)
R2 offset is outside of the packet
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
||
|
|
23127b33ec |
bpf: Create a new btf_name_by_offset() for non type name use case
The current btf_name_by_offset() is returning "(anon)" type name for the offset == 0 case and "(invalid-name-offset)" for the out-of-bound offset case. It fits well for the internal BTF verbose log purpose which is focusing on type. For example, offset == 0 => "(anon)" => anonymous type/name. Returning non-NULL for the bad offset case is needed during the BTF verification process because the BTF verifier may complain about another field first before discovering the name_off is invalid. However, it may not be ideal for the newer use case which does not necessary mean type name. For example, when logging line_info in the BPF verifier in the next patch, it is better to log an empty src line instead of logging "(anon)". The existing bpf_name_by_offset() is renamed to __bpf_name_by_offset() and static to btf.c. A new bpf_name_by_offset() is added for generic context usage. It returns "\0" for name_off == 0 (note that btf->strings[0] is "\0") and NULL for invalid offset. It allows the caller to decide what is the best output in its context. The new btf_name_by_offset() is overlapped with btf_name_offset_valid(). Hence, btf_name_offset_valid() is removed from btf.h to keep the btf.h API minimal. The existing btf_name_offset_valid() usage in btf.c could also be replaced later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
|
|
9f8c1c5712 |
bpf: remove obsolete prog->aux sanitation in bpf_insn_prepare_dump
This logic is not needed anymore since we got rid of the verifier
rewrite that was using prog->aux address in
|
||
|
|
7640ead939 |
bpf: verifier: make sure callees don't prune with caller differences
Currently for liveness and state pruning the register parentage
chains don't include states of the callee. This makes some sense
as the callee can't access those registers. However, this means
that READs done after the callee returns will not propagate into
the states of the callee. Callee will then perform pruning
disregarding differences in caller state.
Example:
0: (85) call bpf_user_rnd_u32
1: (b7) r8 = 0
2: (55) if r0 != 0x0 goto pc+1
3: (b7) r8 = 1
4: (bf) r1 = r8
5: (85) call pc+4
6: (15) if r8 == 0x1 goto pc+1
7: (05) *(u64 *)(r9 - 8) = r3
8: (b7) r0 = 0
9: (95) exit
10: (15) if r1 == 0x0 goto pc+0
11: (95) exit
Here we acquire unknown state with call to get_random() [1]. Then
we store this random state in r8 (either 0 or 1) [1 - 3], and make
a call on line 5. Callee does nothing but a trivial conditional
jump (to create a pruning point). Upon return caller checks the
state of r8 and either performs an unsafe read or not.
Verifier will first explore the path with r8 == 1, creating a pruning
point at [11]. The parentage chain for r8 will include only callers
states so once verifier reaches [6] it will mark liveness only on states
in the caller, and not [11]. Now when verifier walks the paths with
r8 == 0 it will reach [11] and since REG_LIVE_READ on r8 was not
propagated there it will prune the walk entirely (stop walking
the entire program, not just the callee). Since [6] was never walked
with r8 == 0, [7] will be considered dead and replaced with "goto -1"
causing hang at runtime.
This patch weaves the callee's explored states onto the callers
parentage chain. Rough parentage for r8 would have looked like this
before:
[0] [1] [2] [3] [4] [5] [10] [11] [6] [7]
| | ,---|----. | | |
sl0: sl0: / sl0: \ sl0: sl0: sl0:
fr0: r8 <-- fr0: r8<+--fr0: r8 `fr0: r8 ,fr0: r8<-fr0: r8
\ fr1: r8 <- fr1: r8 /
\__________________/
after:
[0] [1] [2] [3] [4] [5] [10] [11] [6] [7]
| | | | | |
sl0: sl0: sl0: sl0: sl0: sl0:
fr0: r8 <-- fr0: r8 <- fr0: r8 <- fr0: r8 <-fr0: r8<-fr0: r8
fr1: r8 <- fr1: r8
Now the mark from instruction 6 will travel through callees states.
Note that we don't have to connect r0 because its overwritten by
callees state on return and r1 - r5 because those are not alive
any more once a call is made.
v2:
- don't connect the callees registers twice (Alexei: suggestion & code)
- add more details to the comment (Ed & Alexei)
v1: don't unnecessarily link caller saved regs (Jiong)
Fixes:
|
||
|
|
c872bdb38f |
bpf: include sub program tags in bpf_prog_info
Changes v2 -> v3: 1. remove check for bpf_dump_raw_ok(). Changes v1 -> v2: 1. Fix error path as Martin suggested. This patch adds nr_prog_tags and prog_tags to bpf_prog_info. This is a reliable way for user space to get tags of all sub programs. Before this patch, user space need to find sub program tags via kallsyms. This feature will be used in BPF introspection, where user space queries information about BPF programs via sys_bpf. Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
9e794163a6 |
bpf: Remove bpf_dump_raw_ok() check for func_info and line_info
The func_info and line_info have the bpf insn offset but
they do not contain kernel address. They will still be useful
for the userspace tool to annotate the xlated insn.
This patch removes the bpf_dump_raw_ok() guard for the
func_info and line_info during bpf_prog_get_info_by_fd().
The guard stays for jited_line_info which contains the kernel
address.
Although this bpf_dump_raw_ok() guard behavior has started since
the earlier func_info patch series, I marked the Fixes tag to the
latest line_info patch series which contains both func_info and
line_info and this patch is fixing for both of them.
Fixes:
|
||
|
|
9a1126b631 |
bpf: add bpffs pretty print for cgroup local storage maps
Implement bpffs pretty printing for cgroup local storage maps
(both shared and per-cpu).
Output example (captured for tools/testing/selftests/bpf/netcnt_prog.c):
Shared:
$ cat /sys/fs/bpf/map_2
# WARNING!! The output is for debug purpose only
# WARNING!! The output format will change
{4294968594,1}: {9999,1039896}
Per-cpu:
$ cat /sys/fs/bpf/map_1
# WARNING!! The output is for debug purpose only
# WARNING!! The output format will change
{4294968594,1}: {
cpu0: {0,0,0,0,0}
cpu1: {0,0,0,0,0}
cpu2: {1,104,0,0,0}
cpu3: {0,0,0,0,0}
}
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|