twx-linux/kernel
Serge E. Hallyn db2e718a47 capabilities: require CAP_SETFCAP to map uid 0
cap_setfcap is required to create file capabilities.

Since commit 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"),
a process running as uid 0 but without cap_setfcap is able to work
around this as follows: unshare a new user namespace which maps parent
uid 0 into the child namespace.

While this task will not have new capabilities against the parent
namespace, there is a loophole due to the way namespaced file
capabilities are represented as xattrs.  File capabilities valid in
userns 1 are distinguished from file capabilities valid in userns 2 by
the kuid which underlies uid 0.  Therefore the restricted root process
can unshare a new self-mapping namespace, add a namespaced file
capability onto a file, then use that file capability in the parent
namespace.

To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the
capability for setting file capabilities.

As a further wrinkle: a task can unshare its user namespace, then open
its uid_map file itself, and map (only) its own uid.  In this case we do
not have the credential from before unshare, which was potentially more
restricted.  So, when creating a user namespace, we record whether the
creator had CAP_SETFCAP.  Then we can use that during map_write().

With this patch:

1. Unprivileged user can still unshare -Ur

   ubuntu@caps:~$ unshare -Ur
   root@caps:~# logout

2. Root user can still unshare -Ur

   ubuntu@caps:~$ sudo bash
   root@caps:/home/ubuntu# unshare -Ur
   root@caps:/home/ubuntu# logout

3. Root user without CAP_SETFCAP cannot unshare -Ur:

   root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
   root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
   unable to set CAP_SETFCAP effective capability: Operation not permitted
   root@caps:/home/ubuntu# unshare -Ur
   unshare: write failed /proc/self/uid_map: Operation not permitted

Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities.  This approach can be seen at [1].

Background history: commit 95ebabde382 ("capabilities: Don't allow
writing ambiguous v3 file capabilities") tried to fix the issue by
preventing v3 fscaps to be written to disk when the root uid would map
to the same uid in nested user namespaces.  This led to regressions for
various workloads.  For example, see [2].  Ultimately this is a valid
use-case we have to support meaning we had to revert this change in
3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing
ambiguous v3 file capabilities")").

Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 [1]
Link: https://github.com/containers/buildah/issues/3071 [2]
Signed-off-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Andrew G. Morgan <morgan@kernel.org>
Tested-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-20 14:28:33 -07:00
..
bpf bpf: Tighten speculative pointer arithmetic mask 2021-04-16 23:52:01 +02:00
cgroup idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
configs staging: ION: remove some references to CONFIG_ION 2021-01-06 17:39:38 +01:00
debug kgdb: fix to kill breakpoints on initmem after boot 2021-02-26 09:41:05 -08:00
dma Merge branch 'stable/for-linus-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2021-02-26 13:59:32 -08:00
entry entry: Explicitly flush pending rcuog wakeup before last rescheduling point 2021-02-17 14:12:43 +01:00
events perf/core: Flush PMU internal buffers for per-CPU events 2021-03-06 12:52:39 +01:00
gcov Revert "gcov: clang: fix clang-11+ build" 2021-04-19 15:08:49 -07:00
irq genirq: Disable interrupts for force threaded handlers 2021-03-21 00:17:52 +01:00
kcsan kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state() 2021-01-04 14:39:07 -08:00
livepatch kallsyms: refactor {,module_}kallsyms_on_each_symbol 2021-02-08 12:22:08 +01:00
locking Two minor fixes: one for a Clang warning, the other improves an 2021-04-11 11:47:03 -07:00
power PM: EM: postpone creating the debugfs dir till fs_initcall 2021-03-23 19:53:48 +01:00
printk Merge branch 'printk-rework' into for-linus 2021-02-22 13:43:55 +01:00
rcu Scheduler updates for v5.12: 2021-02-21 12:35:04 -08:00
sched sched/membarrier: fix missing local execution of ipi_sync_rq_state() 2021-03-06 12:40:21 +01:00
time kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-16 22:13:10 +01:00
trace tracing/dynevent: Fix a memory link in dyn_event_release() 2021-04-13 18:40:00 -07:00
.gitignore
acct.c kernel/acct.c: use #elif instead of #end and #elif 2020-12-15 22:46:15 -08:00
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit_fsnotify.c audit_alloc_mark(): don't open-code ERR_CAST() 2021-02-23 10:25:27 -05:00
audit_tree.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit.c audit: Remove leftover reference to the audit_tasklet 2021-01-15 11:58:10 -05:00
audit.h audit: change unnecessary globals into statics 2020-08-17 20:26:58 -04:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
cpu.c cpu/hotplug: Add lockdep_is_cpus_held() 2021-01-06 16:24:59 -08:00
crash_core.c kdump: append uts_namespace.name offset to VMCOREINFO 2020-12-15 22:46:18 -08:00
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c kernel/io_uring: cancel io_uring before task works 2020-12-30 19:36:54 -07:00
extable.c
fail_function.c fault-injection: handle EI_ETYPE_TRUE 2020-12-15 22:46:19 -08:00
fork.c io_uring-5.12-2021-03-27 2021-03-28 11:42:05 -07:00
freezer.c Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" 2021-03-27 14:09:10 -06:00
futex.c kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-16 22:13:10 +01:00
gen_kheaders.sh
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work: Optimize irq_work_single() 2020-11-24 16:47:49 +01:00
jump_label.c static_call: Fix static_call_update() sanity check 2021-03-19 13:16:44 +01:00
kallsyms.c kallsyms: only build {,module_}kallsyms_on_each_symbol when required 2021-02-08 12:24:04 +01:00
kcmp.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt preempt: Introduce CONFIG_PREEMPT_DYNAMIC 2021-02-17 14:12:24 +01:00
kcov.c kernel: make kcov_common_handle consider the current context 2020-11-02 18:00:20 -08:00
kexec_core.c Merge branch 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-02-21 09:29:23 -08:00
kexec_elf.c
kexec_file.c ima: Free IMA measurement buffer after kexec syscall 2021-02-10 15:49:38 -05:00
kexec_internal.h kexec: move machine_kexec_post_load() to public interface 2021-02-22 12:33:26 +00:00
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Fix to delay the kprobes jump optimization 2021-02-19 14:57:12 -05:00
ksysfs.c
kthread.c - Correct the marking of kthreads which are supposed to run on a specific, 2021-01-24 10:09:20 -08:00
latencytop.c
Makefile kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE 2021-02-16 09:59:41 +01:00
module_signature.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module_signing.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module-internal.h
module.c module: potential uninitialized return in module_kallsyms_on_each_symbol() 2021-02-10 16:57:04 +01:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c fixes-v5.11 2020-12-14 16:40:27 -08:00
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c Modules updates for v5.11 2020-12-17 13:01:31 -08:00
pid_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
pid.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
profile.c
ptrace.c Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals" 2021-03-27 14:09:10 -06:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c Revert "PM: ACPI: reboot: Use S5 for reboot" 2021-03-18 16:58:02 +01:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relay: allow the use of const callback structs 2020-12-15 22:46:18 -08:00
resource_kunit.c resource: provide meaningful MODULE_LICENSE() in test suite 2020-11-25 18:52:35 +01:00
resource.c resource: Move devmem revoke code to resource framework 2021-01-12 14:26:31 +01:00
rseq.c
scftorture.c scftorture: Add debug output for wrong-CPU warning 2021-01-04 13:53:41 -08:00
scs.c scs: switch to vmapped shadow stacks 2020-12-01 10:30:28 +00:00
seccomp.c seccomp: Improve performace by optimizing rmb() 2021-02-10 12:40:11 -08:00
signal.c Revert "signal: don't allow STOP on PF_IO_WORKER threads" 2021-03-27 14:09:11 -06:00
smp.c smp: Process pending softirqs in flush_smp_call_function_from_idle() 2021-02-17 14:12:42 +01:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-01-22 15:09:42 +01:00
smpboot.h
softirq.c softirq: Move do_softirq_own_stack() to generic asm header 2021-02-10 23:34:16 +01:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix static_call_update() sanity check 2021-03-19 13:16:44 +01:00
stop_machine.c Merge branch 'linus' into sched/core, to resolve semantic conflict 2020-11-27 11:10:50 +01:00
sys_ni.c epoll: wire up syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
sys.c prctl: fix PR_SET_MM_AUXV kernel stack leak 2021-03-14 14:33:27 -07:00
sysctl-test.c
sysctl.c sysctl.c: fix underflow value setting risk in vm_table 2021-02-26 09:41:03 -08:00
task_work.c task_work: remove legacy TWA_SIGNAL path 2020-12-12 09:17:38 -07:00
taskstats.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
test_kprobes.c
torture.c torture: Maintain torture-specific set of CPUs-online books 2021-01-06 17:17:22 -08:00
tracepoint.c tracepoints: Code clean up 2021-02-09 12:27:29 -05:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c
user_namespace.c capabilities: require CAP_SETFCAP to map uid 0 2021-04-20 14:28:33 -07:00
user-return-notifier.c
user.c user: Use generic ns_common::count 2020-08-19 14:14:12 +02:00
usermode_driver.c bpf: Fix umd memory leak in copy_process() 2021-03-19 22:23:19 +01:00
utsname_sysctl.c
utsname.c uts: Use generic ns_common::count 2020-08-19 14:13:20 +02:00
watch_queue.c watch_queue: rectify kernel-doc for init_watch() 2021-01-26 11:16:34 +00:00
watchdog_hld.c
watchdog.c workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() 2021-04-04 13:26:49 -04:00
workqueue_internal.h
workqueue.c workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog() 2021-04-04 13:26:49 -04:00