What’s an assembler?

An assembler is a translator of assembly language into binary machine language.

a furniture assembler

In Elements of Computing Systems, we use an assembler called the Hack Assembler. The binary machine language it translates into is for the Hack computer. Note that neither the assembler nor the computer is industry standard and is meant for education.

Assembly code is unlike your usual programming language. A typical computer engineer would work with a programming language higher up on the computer stack and wouldn’t have to write or worry about assembly languages. Why would we? It’s hard. You’ll see why.

Assembly Code

Let’s go through a quick example of the input and output of the assembler so we have a better understanding of the abstraction layer we’re working with.

// Add.asm
@2
D=A
@3
D=D+A
@0
M=D

This is Add.asm, a Hack Assembly code program that adds the numbers 2 and 3 together. You can find this file on the nand2tetris.org website under Project 06 here.

Without diving into specific details too much, allow me to annotate what each line is doing for us.

// Add.asm
@2 // Put 2 into the A-register
D=A // Put 2 into the D-register
@3 // Put 3 into the A-register
D=D+A // Put 2+3 into the D-register
@0 // Put 0 into the A-register
M=D // Save 5 into memory at address 0

I think we’ll both agree that that was a very long winded way to do an addition, because in Ruby we can just write…

2+3

Machine Language

Anyway, the assembly code in Add.asm will be translated by the assembler to look like this:

0000000000000010
1110110000010000
0000000000000011
1110000010010000
0000000000000000
1110001100001000

If you can understand that, then you must be a computer! Otherwise… I’m sorry to inform you that you’re probably human.

Six lines of code to match the six lines of assembly code we had above. Each line of this binary code instructs the computer to perform an action.

Anatomy of the Hack Assembler

Upon dissecting the Hack Assembler, you’ll see three major organs: a parser, code generator, and symbol table.

  1. Parser: parses the assembly code and breaks it down into individual components

    • For example: The parser will be able to inform us that the line, M=D, is a C-instruction.
    • The parser also strips comments and ignore whitespaces from the assembly file.
  2. Code generator: translates the Hack Assembly mnemonics into machine language

    • A C-command can be broken down into three components: a destination, a computation, and a jump.
    • For example: the line M=D;JEQ indicates to us that the destination is M, which means the memory location whose address is in the A-register. The computation is D which is whatever value is in the D-register. And the jump mnemonic, JEQ indicates a jump if the output of the computation is equal to 0.
    • You can find the mnemonics listed for all the three components of the C-command in Chapter 6: Assembler in EoCS, from pages 109-110.

    anatomy of a c-instruction

  3. Symbol table: creates and maintains a registry of symbols and meaning in the program

    • In assembly code, we’re able to use symbols to represent memory addresses. For Hack, several symbols are already predefined, such as SCREEN, which is the label for the start of the memory address where the screen (monitor) is registered for the Hack computer.
    • In addition to the predefined symbols, the assembler program writer can also create their own labels, such as to indicate a loop, using a pseudo-command or a variable symbol.

    • A pseudo-command looks like (XXX):
      (INFINITE_LOOP) // a pseudo-command/label
      [....]
      @INFINITE_LOOP // points to instruction where label was created
      0;JMP
    
    • A variable symbol looks like Xxx, where Xxx isn’t predefined. Variables are mapped to consecutive memory locations starting at address 16.
      @sum_var // points to memory address 16
      M=0 // sum_var = 0
    

Hooking Up All the Parts

With the three components of the parser, we can now piece together the assembler class, Assembler (See it on Github). What we want now is to pass in an .asm file containing assembly code and have the assembler spit out the machine language.

gif of assembler

The assembler class uses a two-pass method to translate the assembly code. First, the file goes through the Parser module where any user-defined symbols are saved to the symbol table, along with the instruction it points to.

Next, we parse the instructions once more. This time we have an updated symbol table, so we’re able to successfully translate each symbol, both user-defined and pre-defined ones. At the same time, any C-commands are coded in binary using the Code module and then returned.

The Assembler

To summarize, the assembler takes in assembly code and returns binary machine code. In many ways, an assembler is just a translator of two programming languages on a different abstraction level. In our case, we have an assembler that translates Hack assembly code into Hack computer machine language. The assembly technique we used here is a two-pass, but there are also one-pass assemblers in practice.

Resources