Well, here it is, the ubiquitous “Hello, World!” example.
Despite the title, there are a couple of interesting things to note in the code below. The first is calculating the length of the string hellotxt, the second being the different usage of the $ operator in GNU assembler.
The string length is calculated using the .dot symbol which is interpreted as “the current address that
asis assembling into” (thegasdocumentation can be found here).I’ve entered three equivalent lines below, two of which I’ve commented out. They would, if compiled, define the symbol
msg_lenand set its value to be the length of the stringhellotxtusing the technique described above. This value is referenced later in the instructionmovq $msg_len, %rdxin which it is set as the value to%rdx.The reason I wanted to draw your attention to this is the incongruent meaning of the
$symbol. In the line above (movq $hellotxt, %rsi) it means “copy the address ofhellotxtto%rsi”, whereas when it references a symbol, it means “replace this placeholder with the value of the symbolmsg_len”. So to get this right, you need to know your symbols from your memory references.
Anyway, without further ado, here’s the code and a Makefile which should build it on a 64-bit Linux distro. I’m currently using as from GNU Binutils 2.20.1-20100303. My kernel is 2.6.32-33 and I’m using Xubuntu.
hello.s
.section .data
hellotxt: .asciz "Hello, World!\\n"
msg_len = . - hellotxt # define a *symbol* to represent the length of the hellotxt string
#.equ msg_len , . - hellotxt # defines the same symbol using an equate
#.set msg_len , . - hellotxt # defines the same symbol using the .set directive
.section .text
.globl _start
_start:
movq $1, %rax # sys_write
movq $1, %rdi # stdout
movq $hellotxt, %rsi # use '$' to get address-of 'hellotxt'
movq $msg_len, %rdx # use '$' to reference the symbol 'msg_len', define above
syscall
movq $60, %rax # sys_exit
movq $0, %rdi # exit code
syscallMakefile
hello: hello.o
ld -o hello hello.o
hello.o: hello.s
as -gstabs -o hello.o hello.s
clean:
rm hello.o helloA quick examination of the output of objdump -D hello shows the following:
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: 48 c7 c0 01 00 00 00 mov $0x1,%rax
4000b7: 48 c7 c7 01 00 00 00 mov $0x1,%rdi
4000be: 48 c7 c6 e0 00 60 00 mov $0x6000e0,%rsi
4000c5: 48 c7 c2 0f 00 00 00 mov $0xf,%rdx
You can see that the value 0xf has been substituted for $msg_len, the length of hellotxt plus it’s trailling null-byte which was added by the .asciz directive.
If you were to compile it with msg_len as a .quad instead, however, the code would look like this:
hello.s
.section .data
hellotxt: .asciz "Hello, World!\\n"
msg_len: .quad . - hellotxt
.section .text
.globl _start
_start:
movq $1, %rax # sys_write
movq $1, %rdi # stdout
movq $hellotxt, %rsi # use '$' to get address-of 'hellotxt'
movq msg_len, %rdx # value-at 'msg_len'
syscall
movq $60, %rax # sys_exit
movq $0, %rdi # exit code
syscall The output of objdump -D hello then looks like this:
Disassembly of section .text:
00000000004000b0 <_start>:
4000b0: 48 c7 c0 01 00 00 00 mov $0x1,%rax
4000b7: 48 c7 c7 01 00 00 00 mov $0x1,%rdi
4000be: 48 c7 c6 e0 00 60 00 mov $0x6000e0,%rsi
4000c5: 48 8b 14 25 ef 00 60 mov 0x6000ef,%rdx
...
Disassembly of section .data:
00000000006000e0 :
6000e0: 48 'H'
6000e1: 65 'e'
6000e2: 6c 'l'
6000e3: 6c 'l'
6000e4: 6f 'o'
...
00000000006000ef :
6000ef: 0f 00 00 The value '15'
6000f2: 00 00
6000f4: 00 00
...
In this, you can see that instead of loading literal 0xF into %rdx, the instruction now loads the value at 0x6000ef into the register. Helpfully, objdump shows that the value at 0x6000ef is… 15.