Rustc Compile过程+线程

文摘   2024-08-22 13:41   湖北  

点击上方蓝字 江湖评谈设为关注/星标




过程

先看下过程,Rust虽然是Native语言,类似于C++。但是实际上,它的Compile过程除了少个JIT ,回收过程少了GC之外,跟.NET这种半托管的语言其实是一样的。

首选它会通过Rustc把Rust源码Compile成MIR,这里的M即是Middle中间的意思。因为后面还有一层IR(LLVM IR),那个才是真正的中间表示。

MIR之后会把它继续进行中间表象的过程增添删除,形成了LLMV IR。因为Rust的后端是LLVM,只有LLVM的IR才能被LLVM识别,并且Compile成机器码。如果直接用Rust自己的MIR,是不会被识别的,Compile也会失败。

最后就是把LLVM IR交给了LLVM的Codegen生成了最终形态的机器码。

下面分别看下:

简单的Rust Souce Code:

fn main() {    println!("hello welcome to rust-lang");}

LLVM IR:

//rustc --emit=llvm-ir main.rs; rustfirstproj::main; Function Attrs: nonlazybind uwtabledefine internal void @_ZN13rustfirstproj4main17h5b70493878db64c8E() unnamed_addr #1 {start:  %_2 = alloca [48 x i8], align 8; call core::fmt::Arguments::new_const  call void @_ZN4core3fmt9Arguments9new_const17h25f8d1540862b316E(ptr sret([48 x i8]) align 8 %_2, ptr align 8 @alloc_6eedf3fda69110149d615db56bf6a65c); call std::io::stdio::_print  call void @_ZN3std2io5stdio6_print17h1a47f568e96855e1E(ptr align 8 %_2)  ret void}

MC,注意这里的MC实际上不是最后的MC。

 //rustc --emit=asm rustfirstproj.rs        .section        .text.main,"ax",@progbits        .globl  main        .p2align        40x90  //表示安装24次方对其,0x90填充        .type   main,@functionmain:        .cfi_startproc //函数记录开头        pushq   %rax        .cfi_def_cfa_offset 16        movq    %rsi, %rdx        movslq  %edi, %rsi        leaq    _ZN13rustfirstproj4main17h5b70493878db64c8E(%rip), %rdi        xorl    %ecx, %ecx        callq   _ZN3std2rt10lang_start17h212d3b6bbba16a8dE        popq    %rcx        .cfi_def_cfa_offset 8        retq.Lfunc_end10:  //函数结束        .size   main, .Lfunc_end10-main        .cfi_endproc  //函数记录结尾

最后的MC:

//objdump -d -M intel -S rustfirstproj > rustfirstproj.txt0000000000007850 <main>:    7850:       50                      push   %rax    7851:       48 89 f2                mov    %rsi,%rdx    7854:       48 63 f7                movslq %edi,%rsi    7857:       48 8d 3d c2 ff ff ff    lea    -0x3e(%rip),%rdi        # 7820 <_ZN13rustfirstproj4main17h5b70493878db64c8E>    785e:       31 c9                   xor    %ecx,%ecx    7860:       e8 ab fe ff ff          call   7710 <_ZN3std2rt10lang_start17h212d3b6bbba16a8dE>    7865:       59                      pop    %rcx    7866:       c3                      ret

线程剖析

rustc在rustc-main到codegen用的是多线程调用,rustc的多线程底层模型依旧是glibc。

use std::thread;
fn main() { let builder = thread::Builder::new(); let handle = builder.spawn(|| { // 这里是新线程的代码 println!("Hello from the new thread!"); }).unwrap();
// 等待新线程结束 handle.join().unwrap();}

rust线程较为灵活,比如unwrap可以捕捉返回的结果。

也可以写成handle.join().expect("error");看Rustc里面的实际例子:

//vim /root/.cargo/registry/src/rsproxy.cn-0dccff568467c15b/ctrlc-3.4.4/src/lib.rs142     thread::Builder::new()143         .name("ctrl-c".into())144         .spawn(move || loop {145             unsafe {146                 platform::block_ctrl_c().expect("Critical system error while waiting for Ct    rl-C");147             }148             user_handler();149         })150         .expect("failed to spawn thread");

lib.rs:142(下面的frame #8处)这里通过spawn新建了线程,此后通过一系列调用,堆栈如下:

(lldb) b __clone3(lldb) r(lldb) bt* thread #1, name = 'rustc', stop reason = breakpoint 4.1  * frame #0: 0x00007fffe7126820 libc.so.6`__clone3 at clone3.S:42    frame #1: 0x00007fffe71268a1 libc.so.6`__GI___clone_internal(cl_args=0x00007fffffffcdf0, func=(libc.so.6`start_thread at pthread_create.c:336:1), arg=0x00007fffdca00640) at clone-internal.c:54:9    frame #2: 0x00007fffe70946d9 libc.so.6`create_thread(pd=0x00007fffdca00640, attr=0x00007fffffffd0a0, stopped_start=0x00007fffffffcf0e, stackaddr=<unavailable>, stacksize=2094720, thread_ran=0x00007fffffffcf0f) at pthread_create.c:295:13    frame #3: 0x00007fffe7095200 libc.so.6`pthread_create@GLIBC_2.2.5 at pthread_create.c:828:14    frame #4: 0x00007fffe752ce1e libstd-c6e0b4b3b1ba5490.so`std::sys::pal::unix::thread::Thread::new::h1f05c92b31e0615c at thread.rs:84:19    frame #5: 0x00007fffef432f7d librustc_driver-ce14868ced500872.so`<std::thread::Builder>::spawn_unchecked_::<ctrlc::set_handler_inner<rustc_driver_impl::install_ctrlc_handler::{closure#0}>::{closure#0}, ()> at mod.rs:561:30    frame #6: 0x00007fffef431e6a librustc_driver-ce14868ced500872.so`<std::thread::Builder>::spawn_unchecked::<ctrlc::set_handler_inner<rustc_driver_impl::install_ctrlc_handler::{closure#0}>::{closure#0}, ()> at mod.rs:442:32    frame #7: 0x00007fffef43448e librustc_driver-ce14868ced500872.so`<std::thread::Builder>::spawn::<ctrlc::set_handler_inner<rustc_driver_impl::install_ctrlc_handler::{closure#0}>::{closure#0}, ()> at mod.rs:375:18    frame #8: 0x00007fffef4313a5 librustc_driver-ce14868ced500872.so`ctrlc::set_handler_inner::<rustc_driver_impl::install_ctrlc_handler::{closure#0}> at lib.rs:142:5

主要看下frmae #4,它这里代码如下:

 //vim /home/tang/opt/rust/compiler/rust/library/std/src/sys/pal/unix/thread.rs:84 let ret = libc::pthread_create(&mut native, &attr, thread_start, p as *mut _);

看它create一个线程,传递了线程执行点thread_start,后者如下:

//vim /home/tang/opt/rust/compiler/rust/library/std/src/sys/pal/unix/thread.rs:99 99         extern "C" fn thread_start(main: *mut libc::c_void) -> *mut libc::c_void {100             unsafe {101                 // Next, set up our stack overflow handler which may get triggered if we ru    n102                 // out of stack.103                 let _handler = stack_overflow::Handler::new();104                 // Finally, let's run some code.105                 Box::from_raw(main as *mut Box<dyn FnOnce()>)();106             }107             ptr::null_mut()108         }

下面只需要找到thread_start调用点即可

看下glibc的__clone3

//https://elixir.bootlin.com/glibc/glibc-2.35/source/sysdeps/unix/sysv/linux/x86_64/clone3.S:42ENTRY (__clone3)  /* Sanity check arguments.  */  movl  $-EINVAL, %eax  test  %RDI_LP, %RDI_LP  /* No NULL cl_args pointer.  */  jz  SYSCALL_ERROR_LABEL  test  %RDX_LP, %RDX_LP  /* No NULL function pointer.  */  jz  SYSCALL_ERROR_LABEL
/* Save the cl_args pointer in R8 which is preserved by the syscall. */ mov %RCX_LP, %R8_LP
/* Do the system call. */ movl $SYS_ify(clone3), %eax
/* End FDE now, because in the child the unwind info will be wrong. */ cfi_endproc syscall
test %RAX_LP, %RAX_LP jl SYSCALL_ERROR_LABEL jz L(thread_start)
ret
L(thread_start): cfi_startproc /* Clearing frame pointer is insufficient, use CFI. */ cfi_undefined (rip) /* Clear the frame pointer. The ABI suggests this be done, to mark the outermost frame obviously. */ xorl %ebp, %ebp
/* Align stack to 16 bytes per the x86-64 psABI. */ and $-16, %RSP_LP
/* Set up arguments for the function call. */ mov %R8_LP, %RDI_LP /* Argument. */ call *%rdx /* Call function. */ /* Call exit with return value from function call. */ movq %rax, %rdi movl $SYS_ify(exit), %eax syscall cfi_endproc
cfi_startprocPSEUDO_END (__clone3)

然后看下__clone3的实际运作方式,我们以codegen为例下断

(lldb) b codegen(lldb) r(lldb) bt * frame #0: 0x00007fffefabf810 librustc_driver-ce14868ced500872.so`rustc_codegen_llvm::allocator::codegen at allocator.rs:13  //中间省略 frame #40: 0x00007fffe752d108 libstd-c6e0b4b3b1ba5490.so`std::sys::pal::unix::thread::Thread::new::thread_start::h541d22c6499e5529 at thread.rs:105:17 frame #41: 0x00007fffe7094ac3 libc.so.6`start_thread(arg=<unavailable>) at pthread_create.c:442:8 frame #42: 0x00007fffe7126850 libc.so.6`__clone3 at clone3.S:81

可以很清晰的看到__clone3:

  call  *%rdx    /* Call function.  */

调用了native C++线程入口:

// https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/pthread_create.c:442start_thread (void *arg){  //省略部分代码   if (pd->c11)  {    /* The function pointer of the c11 thread start is cast to an incorrect       type on __pthread_create_2_1 call, however it is casted back to correct       one so the call behavior is well-defined (it is assumed that pointers       to void are able to represent all values of int.  */    int (*start)(void*) = (int (*) (void*)) pd->start_routine;    ret = (void*) (uintptr_t) start (pd->arg);  }      else  ret = pd->start_routine (pd->arg); //这里调用了thread.rs:99里的thread_start      THREAD_SETMEM (pd, result, ret);}

下面代码调用了thread.rs:99里面的thread_start

ret = pd->start_routine (pd->arg);

总结下:

1. rust创建线程并启动: let builder = thread::Builder::new();    let handle = builder.spawn(|| {        // 这里是新线程的代码        println!("Hello from the new thread!");    }).unwrap();
2.rust调用libc::pthread_create传入了多线程入口thread_start
3.pthread_create调用了glibc.__clone3,__clone3调用了start_thread
4.start_thread反过来调用了上面传入的thread_start线程入口,进入了rust继续后面的调用
5.后面的调用,实际上比如codegen等机器码compile。


结尾

本篇分析了rustc compile过程以及线程调用的过程。rust关键部位依旧是C++,它依赖glibc嘛,当然它也是可以musl的,这点后面再看看。

往期精彩回顾

Rust编译器深入

Rust编译器研究+.NET9 PreView7

Rust编译器+语法探究


江湖评谈
记录,分享,自由。
 最新文章