内容简介:This is an ongoing series of posts on ELF Binary Relocations and Thread Local Storage. This article covers only Thread Local Storage and assumes the reader has had a primer in ELF Relocations, if not please start with my previous articleThis is the third p
This is an ongoing series of posts on ELF Binary Relocations and Thread Local Storage. This article covers only Thread Local Storage and assumes the reader has had a primer in ELF Relocations, if not please start with my previous article ELF Binaries and Relocation Entries .
This is the third part in an illustrated 3 part series covering:
- ELF Binaries and Relocation Entries
- Thread Local Storage
- How Relocations and Thread Local Store are implemented
In the last article we covered how Thread Local Storage (TLS) works at runtime, but how do we get there? How does the compiler and linker create the memory structures and code fragments described in the previous article?
In this article we will discuss how TLS relocations are is implemented. Our outline:
-
- GCC Legitimize Address
-
- Phase 1 - Book Keeping
- Phase 2 - Creating Space
- Phase 4 - Finishing Up
As before, the examples in this article can be found in my tls-examples project. Please check it out.
The GNU toolchain
I will assume here that most people understand what a compiler and assembler basically do. In the sense that compiler will compile routines written C code or something similar to assembly language. It is then up to the assembler to turn that assembly code into machine code to run on a CPU.
That is a big part of what a toolchain does, and it’s pretty much that simple if we have a single file of source code. But usually we don’t have a single file, we have the multiple files, the c runtime , crt0 and other libraries like libc . These all need to be put together into our final program, that is where the complexities of the linker comes in.
In this article I will cover how variables in our source code ( symbols ) traverse the toolchain from code to the memory in our final running program. A picture that looks something like this:
The Compiler
First we start off with how relocations are created and emitted in the compiler.
As I work primarily on the GNU toolchain with it’s GCC compiler we will look at that, let’s get started.
GCC Legitimize Address
To start we define a symbol as named address in memory. This address can be a program variable where data is stored or function reference to where a subroutine starts.
In GCC we have have TARGET_LEGITIMIZE_ADDRESS
, the OpenRISC implementation being or1k_legitimize_address() . It takes a symbol (memory address) and makes it usable in our CPU by generating RTX sequences that are possible on our CPU to load that address into a register.
RTX represents a tree node in GCC’s register transfer language (RTL). The RTL Expression is used to express our algorithm as a series of register transfers. This is used as register transfer is basically what a CPU does.
A snippet from legitimize_address()
function is below. The argument x
represents our input symbol (memory address) that we need to make usable by our CPU. This code uses GCC internal API’s to emit RTX code sequences.
static rtx or1k_legitimize_address (rtx x, rtx /* unused */, machine_mode /* unused */) ... case TLS_MODEL_NONE: t1 = can_create_pseudo_p () ? gen_reg_rtx (Pmode) : scratch; if (!flag_pic) { emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, x))); return gen_rtx_LO_SUM (Pmode, t1, x); } else if (is_local) { crtl->uses_pic_offset_table = 1; t2 = gen_sym_unspec (x, UNSPEC_GOTOFF); emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, t2))); emit_insn (gen_add3_insn (t1, t1, pic_offset_table_rtx)); return gen_rtx_LO_SUM (Pmode, t1, copy_rtx (t2)); } else { ...
We can read the code snippet above as follows:
- This is for the non
TLS
case as we seeTLS_MODEL_NONE
. - We reserve a temporary register
t1
. - If not using Position-independent code (
flag_pic
) we do:- Emit an instruction to put the high bits of
x
into our temporary registert1
. - Return the sum of
t1
and the low bits ofx
.
- Emit an instruction to put the high bits of
- Otherwise if the symbol is static (
is_local
) we do:- Mark the global state that this object file uses the
uses_pic_offset_table
. - We create a Global Offset Table offset variable
t2
. - Emit an instruction to put the high bits of
t2
(the GOT offset) into out temporary registert1
. - Emit an instruction to put the sum of
t1
( high bits oft2) and the GOT into
t1`. - Return the sum of
t1
and the low bits oft1
.
- Mark the global state that this object file uses the
You may have noticed that the local symbol still used the global offset table ( GOT ). This is because Position-idependent code requires using the GOT to reference symbols.
An example, from nontls.c :
static int x; int *get_x_addr() { return &x; }
Example of the non pic case above, when we look at the assembly code generated by GCC we can see the following:
.file "nontls.c" .section .text .local x .comm x,4,4 .align 4 .global get_x_addr .type get_x_addr, @function get_x_addr: l.addi r1, r1, -8 # \ l.sw 0(r1), r2 # | function prologue l.addi r2, r1, 8 # | l.sw 4(r1), r9 # / l.movhi r17, ha(x) # \__ legitimize address of x into r17 l.addi r17, r17, lo(x) # / l.or r11, r17, r17 # } place result in return register r11 l.lwz r2, 0(r1) # \ l.lwz r9, 4(r1) # | function epilogue l.addi r1, r1, 8 # | l.jr r9 # | l.nop # / .size get_x_addr, .-get_x_addr .ident "GCC: (GNU) 9.0.1 20190409 (experimental)"
Example of the local pic case above the same code compiled with the -fPIC
GCC option looks like the following:
.file "nontls.c" .section .text .local x .comm x,4,4 .align 4 .global get_x_addr .type get_x_addr, @function get_x_addr: l.addi r1, r1, -8 # \ l.sw 0(r1), r2 # | function prologue l.addi r2, r1, 8 # | l.sw 4(r1), r9 # / l.jal 8 # \ l.movhi r19, gotpchi(_GLOBAL_OFFSET_TABLE_-4) # | PC relative, put l.ori r19, r19, gotpclo(_GLOBAL_OFFSET_TABLE_+0) # | GOT into r19 l.add r19, r19, r9 # / l.movhi r17, gotoffha(x) # \ l.add r17, r17, r19 # | legitimize address of x into r17 l.addi r17, r17, gotofflo(x) # / l.or r11, r17, r17 # } place result in return register r11 l.lwz r2, 0(r1) # \ l.lwz r9, 4(r1) # | function epilogue l.addi r1, r1, 8 # | l.jr r9 # | l.nop # / .size get_x_addr, .-get_x_addr .ident "GCC: (GNU) 9.0.1 20190409 (experimental)"
TLS and Addend cases are also handled by or1k_legitimize_address()
.
GCC Print Operand
Once RTX is generated by legitimize address andGCC passes run all of their optimizations the RTX needs to be printed out as assembly code. During this process relocations are printed by GCC macros TARGET_PRINT_OPERAND_ADDRESS
and TARGET_PRINT_OPERAND
. In OpenRISC these defined by or1k_print_operand_address() and or1k_print_operand() .
Let us have a look at or1k_print_operand_address()
.
/* Worker for TARGET_PRINT_OPERAND_ADDRESS. Prints the argument ADDR, an address RTX, to the file FILE. The output is formed as expected by the OpenRISC assembler. Examples: RTX OUTPUT (reg:SI 3) 0(r3) (plus:SI (reg:SI 3) (const_int 4)) 0x4(r3) (lo_sum:SI (reg:SI 3) (symbol_ref:SI ("x"))) lo(x)(r3) */ static void or1k_print_operand_address (FILE *file, machine_mode, rtx addr) { rtx offset; switch (GET_CODE (addr)) { case REG: fputc ('0', file); break; case ... case LO_SUM: offset = XEXP (addr, 1); addr = XEXP (addr, 0); print_reloc (file, offset, 0, RKIND_LO); break; default: ... } fprintf (file, "(%s)", reg_names[REGNO (addr)]); }
The above code snippet can be read as we explain below, but let’s first make some notes:
- The input RTX
addr
forTARGET_PRINT_OPERAND_ADDRESS
will usually contain a register and an offset typically this is used for LOAD and STORE operations. - Think of the RTX
addr
as a node in an AST . - The RTX node with code
REG
andSYMBOL_REF
are always leaf nodes.
With that, and if we use the or1k_print_operand_address()
c comments above as examples of some RTX addr
input we will have:
RTX | (reg:SI 3) (lo_sum:SI (reg:SI 3) (symbol_ref:SI("x"))) -----------+-------------------------------------------------------------------- TREE | (code) | (code:REG regno:3) (code:LO_SUM) / \ | / \ (0) (1) | (code:REG regno:3) (code:SYMBOL_REF "x")
We can now read the above snippet as:
- First get the
CODE
of the RTX.- If
CODE
isREG
(a register) than our offset can be0
. - If
IS
isLO_SUM
(an addition operation) then we need to break it down to:- Arg
0
is our newaddr
RTX (which we assume is a register) - Arg
1
is an offset (which we then print with print_reloc() )
- Arg
- If
- Second print out the register name now in
addr
i.e. “r3”.
The code of or1k_print_operand()
is similar and the reader may be inclined to read more details. With that we can move on to the assembler.
TLS cases are also handled inside of the print_reloc()
function.
The Assembler
In the GNU Toolchain our assembler is GAS, part of binutils .
The code that handles relocations is found in the function parse_reloc() found in opcodes/or1k-asm.c
. The function parse_reloc()
is the direct counterpart of GCC’s print_reloc()
discussed above. This is actually part of or1k_cgen_parse_operand() which is wired into our assembler generator CGEN used for parsing operands.
If we are parsing a relocation like the one from above lo(x)
then we can isolate the code that processes that relocation.
static const bfd_reloc_code_real_type or1k_imm16_relocs[][6] = { { BFD_RELOC_LO16, BFD_RELOC_OR1K_SLO16, ... BFD_RELOC_OR1K_TLS_LE_AHI16 }, }; static int parse_reloc (const char **strp) { const char *str = *strp; enum or1k_rclass cls = RCLASS_DIRECT; enum or1k_rtype typ; ... else if (strncasecmp (str, "lo(", 3) == 0) { str += 3; typ = RTYPE_LO; } ... *strp = str; return (cls << RCLASS_SHIFT) | typ; }
This uses strncasecmp to match our "lo("
string pattern. The returned result is a relocation type and relocation class which are use to lookup the relocation BFD_RELOC_LO16
in the or1k_imm16_relocs[][]
table which is indexed by relocation class and relocation class .
The assembler will encode that into the ELF binary. For TLS relocations the exact same pattern is used.
The Linker
In the GNU Toolchain our object linker is the GNU linker LD, also part of the binutils project.
The GNU linker uses the framework BFD or Binary File Descriptor which is a beast. It is not only used in the linker but also used in GDB, the GNU Simulator and the objdump tool.
What makes this possible is a rather complex API.
BFD Linker API
The BFD API is a generic binary file access API. It has been designed to support multiple file formats and architectures via an object oriented, polymorphic API all written in c. It supports file formats including a.out , COFF and ELF as well as unexpected file formats like verilog hex memory dumps .
Here we will concentrate on the BFD ELF implementation.
The API definition is split across multiple files which include:
- bfd/bfd-in.h - top level generic APIs including
bfd_hash_table
- bfd/bfd-in2.h - top level binary file APIs including
bfd
andasection
- include/bfdlink.h - generic bfd linker APIs including
bfd_link_info
andbfd_link_hash_table
- bfd/elf-bfd.h - extensions to the APIs for ELF binaries including
elf_link_hash_table
-
bfd/elf{wordsize}-{architecture}.c
- architecture specific implementations
For each architecture implementations are defined in bfd/elf{wordsize}-{architecture}.c
. For example for OpenRISC we have bfd/elf32-or1k.c .
Throughout the linker code we see access to the BFD Linker and ELF APIs. Some key symbols to watch out for include:
-
info
- A reference tobfd_link_info
top level reference to all linker state. -
htab
- A pointer toelf_or1k_link_hash_table
fromor1k_elf_hash_table (info)
, a hash table on steroids which stores generic link state and arch specific state, it’s also a hash table of all global symbols by name, contains:-
htab->root.splt
- the output.plt
section -
htab->root.sgot
- the output.got
section -
htab->root.srelgot
- the output.relgot
section (relocations against the got) -
htab->root.sgotplt
- the output.gotplt
section -
htab->root.dynobj
- a specialbfd
to which sections are added (created inor1k_elf_check_relocs
)
-
-
sym_hashes
- Fromelf_sym_hashes (abfd)
a list of for global symbols in abfd
indexed by the relocation indexELF32_R_SYM (rel->r_info)
. -
h
- A pointer to astruct elf_link_hash_entry
, represents link state of a global symbol, contains:-
h->got
- A union of different attributes with different roles based on link phase. -
h->got.refcount
- used during phase 1 to count the symbol.got
section references -
h->got.offset
- used during phase 2 to record the symbol.got
section offset -
h->plt
- A union with the same function ash->got
but used for the.plt
section. -
h->root.root.string
- The symbol name
-
-
local_got
- an array ofunsigned long
fromelf_local_got_refcounts (ibfd)
with the same function toh->got
but for local symbols, the function of theunsigned long
is changed base on the link phase. Ideally this should also be a union. -
tls_type
- Retrieved by((struct elf_or1k_link_hash_entry *) h)->tls_type
used to store thetls_type
of a global symbol. -
local_tls_type
- Retrieved byelf_or1k_local_tls_type(abfd)
entry to storetls_type
for local symbols, whenh
isNULL
. -
root
- The struct fieldroot
is used in subclasses to represent the parent class, similar to howsuper
is used in other languages.
Putting it all together we have a diagram like the following:
Now that we have a bit of understanding of the data structures we can look to the link algorithm.
The link process in the GNU Linker can be thought of in phases.
Phase 1 - Book Keeping (check_relocs)
The or1k_elf_check_relocs()
function is called during the first phase to do book keeping on relocations. The function signature looks like:
static bfd_boolean or1k_elf_check_relocs (bfd *abfd, struct bfd_link_info *info, asection *sec, const Elf_Internal_Rela *relocs) #define elf_backend_check_relocs or1k_elf_check_relocs
The arguments being:
abfd info sec relocs
It does the book keeping by looping over relocations for the provided section and updating the local and global symbol properties.
For local symbols:
... else { unsigned char *local_tls_type; /* This is a TLS type record for a local symbol. */ local_tls_type = (unsigned char *) elf_or1k_local_tls_type (abfd); if (local_tls_type == NULL) { bfd_size_type size; size = symtab_hdr->sh_info; local_tls_type = bfd_zalloc (abfd, size); if (local_tls_type == NULL) return FALSE; elf_or1k_local_tls_type (abfd) = local_tls_type; } local_tls_type[r_symndx] |= tls_type; } ... else { bfd_signed_vma *local_got_refcounts; /* This is a global offset table entry for a local symbol. */ local_got_refcounts = elf_local_got_refcounts (abfd); if (local_got_refcounts == NULL) { bfd_size_type size; size = symtab_hdr->sh_info; size *= sizeof (bfd_signed_vma); local_got_refcounts = bfd_zalloc (abfd, size); if (local_got_refcounts == NULL) return FALSE; elf_local_got_refcounts (abfd) = local_got_refcounts; } local_got_refcounts[r_symndx] += 1; }
The above is pretty straight forward and we can read as:
- First part is for storing local symbol
TLS
type information:- If the
local_tls_type
array is not initialized:- Allocate it, 1 entry for each local variable
- Record the TLS type in
local_tls_type
for the current symbol
- If the
- Second part is for recording
.got
section references:- If the
local_got_refcounts
array is not initialized:- Allocate it, 1 entry for each local variable
- Record a reference by incrementing
local_got_refcounts
for the current symbol
- If the
For global symbols, it’s much more easy we see:
... if (h != NULL) ((struct elf_or1k_link_hash_entry *) h)->tls_type |= tls_type; else ... if (h != NULL) h->got.refcount += 1; else ...
As the tls_type
and refcount
fields are available directly on each hash_entry
handling global symbols is much easier.
- First part is for storing
TLS
type information:- Record the TLS type in
tls_type
for the currenthash_entry
- Record the TLS type in
- Second part is for recording
.got
section references:- Record a reference by incrementing
got.refcounts
for thehash_entry
- Record a reference by incrementing
The above is repeated for all relocations and all input sections. A few other things are also done including accounting for .plt
entries.
Phase 2 - creating space (size_dynamic_sections + _bfd_elf_create_dynamic_sections)
The or1k_elf_size_dynamic_sections() function iterates over all input object files to calculate the size required for output sections. The _bfd_elf_create_dynamic_sections()
function does the actual section allocation, we use the generic version.
Setting up the sizes of the .got
section (global offset table) and .plt
section (procedure link table) is done here.
The definition is as below:
static bfd_boolean or1k_elf_size_dynamic_sections (bfd *output_bfd ATTRIBUTE_UNUSED, struct bfd_link_info *info) #define elf_backend_size_dynamic_sections or1k_elf_size_dynamic_sections #define elf_backend_create_dynamic_sections _bfd_elf_create_dynamic_sections
The arguments to or1k_elf_size_dynamic_sections()
being:
-
output_bfd
- Unused , the output elf object -
info
- the BFD API which provides access to everything we need
Internally the function uses:
-
htab
- fromor1k_elf_hash_table (info)
-
htab->root.dynamic_sections_created
-true
if sections like.interp
have been created by the linker
-
-
ibfd
- abfd
pointer frominfo->input_bfds
, represents an input object when iterating. -
s->size
- represents the output.got
section size, which we will be incrementing. -
srel->size
- represents the output.got.rela
section size, which will contain relocations against the.got
section
During the first part of phase 2 we set .got
and .got.rela
section sizes for local symbols with this code:
/* Set up .got offsets for local syms, and space for local dynamic relocs. */ for (ibfd = info->input_bfds; ibfd != NULL; ibfd = ibfd->link.next) { ... local_got = elf_local_got_refcounts (ibfd); if (!local_got) continue; symtab_hdr = &elf_tdata (ibfd)->symtab_hdr; locsymcount = symtab_hdr->sh_info; end_local_got = local_got + locsymcount; s = htab->root.sgot; srel = htab->root.srelgot; local_tls_type = (unsigned char *) elf_or1k_local_tls_type (ibfd); for (; local_got < end_local_got; ++local_got) { if (*local_got > 0) { unsigned char tls_type = (local_tls_type == NULL) ? TLS_UNKNOWN : *local_tls_type; *local_got = s->size; or1k_set_got_and_rela_sizes (tls_type, bfd_link_pic (info), &s->size, &srel->size); } else *local_got = (bfd_vma) -1; if (local_tls_type) ++local_tls_type; } }
Here, for example, we can see we iterate over each input elf object ibfd
and each local symbol ( local_got
) we try and update s->size
and srel->size
to account for the required size.
The above can be read as:
- For each
local_got
entry:- If the local symbol is used in the
.got
section:- Get the
tls_type
byte stored in thelocal_tls_type
array - Set the offset
local_got
to the section offsets->size
, that is used in phase 3 to tell us where we need to write the symbol into the.got
section. - Update
s->size
andsrel->size
usingor1k_set_got_and_rela_sizes()
- Get the
- If the local symbol is used in the
- If the local symbol is not used in the
.got
section:- Set the offset
local_got
to the-1
, to indicate not used
- Set the offset
In the next part of phase 2 we allocate space for all global symbols by iterating through symbols in htab
with the allocate_dynrelocs
iterator. To do that we call:
elf_link_hash_traverse (&htab->root, allocate_dynrelocs, info);
Inside allocate_dynrelocs()
we record the space used for relocations and the .got
and .plt
sections. Example:
if (h->got.refcount > 0) { asection *sgot; bfd_boolean dyn; unsigned char tls_type; ... sgot = htab->root.sgot; h->got.offset = sgot->size; tls_type = ((struct elf_or1k_link_hash_entry *) h)->tls_type; dyn = htab->root.dynamic_sections_created; dyn = WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, bfd_link_pic (info), h); or1k_set_got_and_rela_sizes (tls_type, dyn, &sgot->size, &htab->root.srelgot->size); } else h->got.offset = (bfd_vma) -1;
The above, with h
being our global symbol, a pointer to struct elf_link_hash_entry
, can be read as:
- If the symbol will be in the
.got
section:- Get the global reference to the
.got
section and put it insgot
- Set the got location
h->got.offset
for the symbol to the current got section sizehtab->root.sgot
. - Set
dyn
totrue
if we will be doing a dynamic link. - Call
or1k_set_got_and_rela_sizes()
to update the sizes for the.got
and.got.rela
sections.
- Get the global reference to the
- If the symbol is going to be in the
.got
section:- Set the got location
h->got.offset
to-1
- Set the got location
The function or1k_set_got_and_rela_sizes()
used above is used to increment .got
and .rela
section sizes accounting for if these are TLS symbols, which need additional entries and relocations.
Phase 3 - linking (relocate_section)
The or1k_elf_relocate_section() function is called to fill in the relocation holes in the output binary .text
section. It does this by looping over relocations and writing to the .text
section the correct symbol value (memory address). It also updates other output binary sections like the .got
section. Also, for dynamic executables and libraries new relocations may be written to .rela
sections.
The function signature looks as follows:
static bfd_boolean or1k_elf_relocate_section (bfd *output_bfd, struct bfd_link_info *info, bfd *input_bfd, asection *input_section, bfd_byte *contents, Elf_Internal_Rela *relocs, Elf_Internal_Sym *local_syms, asection **local_sections) #define elf_backend_relocate_section or1k_elf_relocate_section
The arguments to or1k_elf_relocate_sectioni()
being:
-
output_bfd
- the output elf object we will be writing to -
info
- the BFD API which provides access to everything we need -
input_bfd
- the current input elf object being iterated over -
input_section
the current.text
section in the input elf object being iterated over. From here we get.text
section output details for pc relative relocations:input_section->output_section->vma input_section->output_offset
-
contents
- the output file buffer we will write to -
relocs
- relocations from the current input section -
local_syms
- an array of local symbols used to get therelocation
value for local symbols -
local_sections
- an array input sections for local symbols, used to get therelocation
value for local symbols
Internally the function uses:
- or1k_elf_howto_table - not mentioned until now, but an array of
howto
structs indexed by relocation enum. Thehowto
struct expresses the algorithm required to update the relocation. -
relocation
- abfd_vma
the value of the relocation symbol (memory address) to be written to the output file. in the output file that needs to be updated for the relocation. -
value
- the value that needs to be written to the relocation location.
During the first part of relocate_section
we see:
if (r_symndx < symtab_hdr->sh_info) { sym = local_syms + r_symndx; sec = local_sections[r_symndx]; relocation = _bfd_elf_rela_local_sym (output_bfd, sym, &sec, rel); name = bfd_elf_string_from_elf_section (input_bfd, symtab_hdr->sh_link, sym->st_name); name = name == NULL ? bfd_section_name (sec) : name; } else { bfd_boolean unresolved_reloc, warned, ignored; RELOC_FOR_GLOBAL_SYMBOL (info, input_bfd, input_section, rel, r_symndx, symtab_hdr, sym_hashes, h, sec, relocation, unresolved_reloc, warned, ignored); name = h->root.root.string; }
This can be read as:
- If the current symbol is a local symbol:
- We initialize
relocation
to the local symbol value using_bfd_elf_rela_local_sym()
.
- We initialize
- Otherwise the current symbol is global:
- We use the
RELOC_FOR_GLOBAL_SYMBOL()
macro to initializerelocation
.
- We use the
During the next part we use the howto
information to update the relocation
value, and also add relocations to the output file. For example:
case R_OR1K_TLS_GD_HI16: case R_OR1K_TLS_GD_LO16: case R_OR1K_TLS_GD_PG21: case R_OR1K_TLS_GD_LO13: case R_OR1K_TLS_IE_HI16: case R_OR1K_TLS_IE_LO16: case R_OR1K_TLS_IE_PG21: case R_OR1K_TLS_IE_LO13: case R_OR1K_TLS_IE_AHI16: { bfd_vma gotoff; Elf_Internal_Rela rela; asection *srelgot; bfd_byte *loc; bfd_boolean dynamic; int indx = 0; unsigned char tls_type; srelgot = htab->root.srelgot; /* Mark as TLS related GOT entry by setting bit 2 to indcate TLS and bit 1 to indicate GOT. */ if (h != NULL) { gotoff = h->got.offset; tls_type = ((struct elf_or1k_link_hash_entry *) h)->tls_type; h->got.offset |= 3; } else { unsigned char *local_tls_type; gotoff = local_got_offsets[r_symndx]; local_tls_type = (unsigned char *) elf_or1k_local_tls_type (input_bfd); tls_type = local_tls_type == NULL ? TLS_NONE : local_tls_type[r_symndx]; local_got_offsets[r_symndx] |= 3; } /* Only process the relocation once. */ if ((gotoff & 1) != 0) { gotoff += or1k_initial_exec_offset (howto, tls_type); /* The PG21 and LO13 relocs are pc-relative, while the rest are GOT relative. */ relocation = got_base + (gotoff & ~3); if (!(r_type == R_OR1K_TLS_GD_PG21 || r_type == R_OR1K_TLS_GD_LO13 || r_type == R_OR1K_TLS_IE_PG21 || r_type == R_OR1K_TLS_IE_LO13)) relocation -= got_sym_value; break; } ... /* Static GD. */ else if ((tls_type & TLS_GD) != 0) { bfd_put_32 (output_bfd, 1, sgot->contents + gotoff); bfd_put_32 (output_bfd, tpoff (info, relocation, dynamic), sgot->contents + gotoff + 4); } gotoff += or1k_initial_exec_offset (howto, tls_type); ... /* Static IE. */ else if ((tls_type & TLS_IE) != 0) bfd_put_32 (output_bfd, tpoff (info, relocation, dynamic), sgot->contents + gotoff); /* The PG21 and LO13 relocs are pc-relative, while the rest are GOT relative. */ relocation = got_base + gotoff; if (!(r_type == R_OR1K_TLS_GD_PG21 || r_type == R_OR1K_TLS_GD_LO13 || r_type == R_OR1K_TLS_IE_PG21 || r_type == R_OR1K_TLS_IE_LO13)) relocation -= got_sym_value; } break;
Here we process the relocation for TLS General Dynamic and Initial Exec relocations. I have trimmed out the shared cases to save space.
This can be read as:
- Get a reference to the output relocation section
sreloc
. - Get the got offset which we setup during phase 3 for global or local symbols.
- Mark the symbol as using a TLS got entry, this
offset |= 3
trick is possible because on 32-bit machines we have 2 lower bits free. This is used during phase 4. - If we have already processed this symbol once:
- Update
relocation
to the location in the output.got
section and break , we only need to create.got
entries 1 time
- Update
- Otherwise populate
.got
section entries- For General Dynamic
- Put 2 entries into the output elf object
.got
section, a literal1
and the thread pointer offset
- Put 2 entries into the output elf object
- For Initial Exec
- Put 1 entry into the output elf object
.got
section, the thread pointer offset
- Put 1 entry into the output elf object
- For General Dynamic
- Finally update the
relocation
to the location in the output.got
section
In the last part of the loop we write the relocation
value to the output .text
section. This is done with the or1k_final_link_relocate() function.
r = or1k_final_link_relocate (howto, input_bfd, input_section, contents, rel->r_offset, relocation + rel->r_addend);
With this the .text
section is complete.
Phase 4 - finishing up (finish_dynamic_symbol + finish_dynamic_sections)
During phase 3 above we wrote the .text
section out to file. During the final finishing up phase we need to write the remaining sections. This includes the .plt
section an more writes to the .got
section.
This also includes the .plt.rela
and .got.rela
sections which contain dynamic relocation entries.
Writing of the data sections is handled by or1k_elf_finish_dynamic_sections() and writing of the relocation sections is handled by or1k_elf_finish_dynamic_symbol() . These are defined as below.
static bfd_boolean or1k_elf_finish_dynamic_sections (bfd *output_bfd, struct bfd_link_info *info) static bfd_boolean or1k_elf_finish_dynamic_symbol (bfd *output_bfd, struct bfd_link_info *info, struct elf_link_hash_entry *h, Elf_Internal_Sym *sym) #define elf_backend_finish_dynamic_sections or1k_elf_finish_dynamic_sections #define elf_backend_finish_dynamic_symbol or1k_elf_finish_dynamic_symbol
A snippet for the or1k_elf_finish_dynamic_sections()
shows how when writing to the .plt
section assembly code needs to be injected. This is where the first entry in the .plt
section is written.
else if (bfd_link_pic (info)) { plt0 = OR1K_LWZ(15, 16) | 8; /* .got+8 */ plt1 = OR1K_LWZ(12, 16) | 4; /* .got+4 */ plt2 = OR1K_NOP; } else { unsigned ha = ((got_addr + 0x8000) >> 16) & 0xffff; unsigned lo = got_addr & 0xffff; plt0 = OR1K_MOVHI(12) | ha; plt1 = OR1K_LWZ(15,12) | (lo + 8); plt2 = OR1K_LWZ(12,12) | (lo + 4); } or1k_write_plt_entry (output_bfd, splt->contents, plt0, plt1, plt2, OR1K_JR(15)); elf_section_data (splt->output_section)->this_hdr.sh_entsize = 4;
Here we see a write to output_bfd
, this represents the output object file which we are writing to. The argument splt->contents
represents the object file offset to write to for the .plt
section. Next we see the line elf_section_data (splt->output_section)->this_hdr.sh_entsize = 4
this allows the linker to calculate the size of the section.
A snippet from the or1k_elf_finish_dynamic_symbol()
function shows where we write out the code and dynamic relocation entries for each symbol to the .plt
section.
splt = htab->root.splt; sgot = htab->root.sgotplt; srela = htab->root.srelplt; ... else { unsigned ha = ((got_addr + 0x8000) >> 16) & 0xffff; unsigned lo = got_addr & 0xffff; plt0 = OR1K_MOVHI(12) | ha; plt1 = OR1K_LWZ(12,12) | lo; plt2 = OR1K_ORI0(11) | plt_reloc; } or1k_write_plt_entry (output_bfd, splt->contents + h->plt.offset, plt0, plt1, plt2, OR1K_JR(12)); /* Fill in the entry in the global offset table. We initialize it to point to the top of the plt. This is done to lazy lookup the actual symbol as the first plt entry will be setup by libc to call the runtime dynamic linker. */ bfd_put_32 (output_bfd, plt_base_addr, sgot->contents + got_offset); /* Fill in the entry in the .rela.plt section. */ rela.r_offset = got_addr; rela.r_info = ELF32_R_INFO (h->dynindx, R_OR1K_JMP_SLOT); rela.r_addend = 0; loc = srela->contents; loc += plt_index * sizeof (Elf32_External_Rela); bfd_elf32_swap_reloca_out (output_bfd, &rela, loc);
Here we can see we write 3 things to output_bfd
for the single .plt
entry. We write:
- The assembly code to the
.plt
section. - The
plt_base_addr
(the first entry in the.plt
for runtime lookup) to the.got
section. - And finally a dynamic relocation for our symbol to the
.plt.rela
.
With that we have written all of the sections out to our final elf object, and it’s ready to be used.
GLIBC Runtime Linker
The runtime linker, also referred to as the dynamic linker, will do the final linking as we load our program and shared libraries into memory. It can process a limited set of relocation entries that were setup above during phase 4 of linking.
The runtime linker implementation is found mostly in the elf/dl-*
GLIBC source files. Dynamic relocation processing is handled in by the _dl_relocate_object() function in the elf/dl-reloc.c
file. The back end macro used for relocation ELF_DYNAMIC_RELOCATE is defined across several files including elf/dynamic-link.h
and elf/do-rel.h Architecture specific relocations are handled by the function elf_machine_rela()
, the implementation for OpenRISC being in sysdeps/or1k/dl-machine.h .
In summary from top down:
- elf/rtld.c - implements
dl_main()
the top level entry for the dynamic linker. - elf/dl-open.c - function
dl_open_worker()
calls_dl_relocate_object()
, you may also recognize this from dlopen(3) . - elf/dl-reloc.c - function
_dl_relocate_object
callsELF_DYNAMIC_RELOCATE
-
elf/dynamic-link.h
- defined macroELF_DYNAMIC_RELOCATE
callself_dynamic_do_Rel()
via several macros -
elf/do-rel.h
- functionelf_dynamic_do_Rel()
callself_machine_rela()
-
sysdeps/or1k/dl-machine.h
- architecture specific functionelf_machine_rela()
implements dynamic relocation handling
It supports relocations for:
-
R_OR1K_NONE
- do nothing -
R_OR1K_COPY
- used to copy initial values from shared objects to process memory. -
R_OR1K_32
- a32-bit
value -
R_OR1K_GLOB_DAT
- aligned32-bit
values forGOT
entries -
R_OR1K_JMP_SLOT
- aligned32-bit
values forPLT
entries -
R_OR1K_TLS_DTPMOD/R_OR1K_TLS_DTPOFF
- for shared TLS GDGOT
entries -
R_OR1K_TLS_TPOFF
- for shared TLS IEGOT
entries
A snippet of the OpenRISC implementation of elf_machine_rela()
can be seen below. It is pretty straight forward.
/* Perform the relocation specified by RELOC and SYM (which is fully resolved). MAP is the object containing the reloc. */ auto inline void __attribute ((always_inline)) elf_machine_rela (struct link_map *map, const Elf32_Rela *reloc, const Elf32_Sym *sym, const struct r_found_version *version, void *const reloc_addr_arg, int skip_ifunc) { struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type); Elf32_Addr value = SYMBOL_ADDRESS (sym_map, sym, true); ... switch (r_type) { ... case R_OR1K_32: /* Support relocations on mis-aligned offsets. */ value += reloc->r_addend; memcpy (reloc_addr_arg, &value, 4); break; case R_OR1K_GLOB_DAT: case R_OR1K_JMP_SLOT: *reloc_addr = value + reloc->r_addend; break; ... } }
Handling TLS
The complicated part of the runtime linker is how it handles TLS variables.
This is done in the following files and functions.
-
elf/rtld.c
- implements init_tls() which initializes the TLS data structures. - elf/dl-tls.c - The runtime linker tls implementation the top level initialization code including _dl_allocate_tls_storage() and _dl_allocate_tls_init() .
The reader can read through the initialization code which is pretty straight forward, except for the macros. Like most GNU code the code relies heavily on untyped macros. These macros are defined in the architecture specific implementation files. For OpenRISC this is:
- sysdeps/or1k/nptl/tls.h - contains the definition of the TLS structures used for OpenRISC.
From theprevious article on TLS we have the TLS data structure that looks as follows:
dtv[] [ dtv[0], dtv[1], dtv[2], .... ] counter ^ | \ ----/ / \________ / V V /------TCB-------\/----TLS[1]----\ /----TLS[2]----\ | pthread tcbhead | tbss tdata | | tbss tdata | \----------------/\--------------/ \--------------/ ^ | TP-----/
The symbols and macros defined in sysdeps/or1k/nptl/tls.h
are:
-
__thread_self
- a symbol representing the current thread always -
TLS_DTV_AT_TP
- used throughout the TLS code to adjust offsets -
TLS_TCB_AT_TP
- used throughout the TLS code to adjust offsets -
TLS_TCB_SIZE
- used duringinit_tls()
to allocate memory for TLS -
TLS_PRE_TCB_SIZE
- used duringinit_tls()
to allocate space for thepthread
struct -
INSTALL_DTV
- used during initialization to update a new dtv pointer into the given tcb -
GET_DTV
- gets dtv via the provided tcb pointer -
INSTALL_NEW_DTV
- used during resizing to update the dtv into the current runtime__thread_self
-
TLS_INIT_TP
- sets__thread_self
this is the final step ininit_tls()
-
THREAD_DTV
- gets dtv via _thread_self -
THREAD_SELF
- get the pthread pointer via__thread_self
Implementations for OpenRISC are:
register tcbhead_t *__thread_self __asm__("r10"); #define TLS_DTV_AT_TP 1 #define TLS_TCB_AT_TP 0 #define TLS_TCB_SIZE sizeof (tcbhead_t) #define TLS_PRE_TCB_SIZE sizeof (struct pthread) #define INSTALL_DTV(tcbp, dtvp) (((tcbhead_t *) (tcbp))->dtv = (dtvp) + 1) #define GET_DTV(tcbp) (((tcbhead_t *) (tcbp))->dtv) #define TLS_INIT_TP(tcbp) ({__thread_self = ((tcbhead_t *)tcbp + 1); NULL;}) #define THREAD_DTV() ((((tcbhead_t *)__thread_self)-1)->dtv) #define INSTALL_NEW_DTV(dtv) (THREAD_DTV() = (dtv)) #define THREAD_SELF \ ((struct pthread *) ((char *) __thread_self - TLS_INIT_TCB_SIZE \ - TLS_PRE_TCB_SIZE))
Summary
We have looked at how symbols move from the Compiler, to Assembler, to Linker to Runtime linker.
This has ended up being a long article to explain a rather complicated subject. Let’s hope it helps provide a good reference for others who want to work on the GNU toolchain in the future.
Further Reading
- GCC Passes - My blog entry on GCC passes
- bfdint - The BFD developer’s manual
- ldint - The LD developer’s manual
- LD and BFD Gist - Dump of notes I collected while working on this article.
很遗憾的说,推酷将在这个月底关闭。人生海海,几度秋凉,感谢那些有你的时光。
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
UNIX编程环境
Brian W.Kernighan、Rob Pike / 陈向群/等 / 机械工业出版社 / 1999-10-1 / 24.00
本书对UNIX操作系统的编程环境做了详细而深入的讨论,内容包括UNIX的文件系统、Shell、过滤程序、I/O编程、系统调用等,并对UNIX中的程序开发方法做了有针对性的指导。本书内容深入浅出,实例丰富,无论是UNIX系统的初学者还是专业人员都可从本书受益。本书亦可作为大学生、研究生学习UNIX的教材。一起来看看 《UNIX编程环境》 这本书的介绍吧!