3 - Characters, Strings, and Arrays

Characters, strings, and arrays are simply sequences of bytes stored in memory. Assembly does not treat strings as special types; they are just contiguous blocks of memory.

This note demonstrates how characters, strings, and lists are stored and accessed in assembly.

The Experiment

Consider the following program:

section .data
    char DB "A", 0
    string DB "Suyash", 0
    list DB 43,77,9,18
 
    string2 DW "Hello", 0
 
section .text
global _start
 
_start:
    MOV eax, 0x1
 
    MOV bl, [char]
    MOV cl, [string]
    MOV dl, [list]
 
    MOV bl, [string + 2]
    MOV bh, [string + 1]
    MOV cl, [list + 3]
    MOV ch, [list + 2]
 
    MOV dx, [string2]
 
    INT 0x80

1. Characters in Assembly

Example:

char DB "A", 0

Explanation:

  • "A" is stored as its ASCII value.
  • ASCII value of 'A' is 65.
  • The extra 0 represents the null terminator.

Memory layout:

AddressValue
char65 (A)
char+10

Reading the value:

MOV bl, [char]

Result:

bl = 65

2. Strings in Assembly

Example:

string DB "Suyash", 0

Each character occupies 1 byte.

Memory layout:

OffsetCharacterASCII
string+0S83
string+1u117
string+2y121
string+3a97
string+4s115
string+5h104
string+60null terminator

Example access:

MOV cl, [string]

Result:

cl = ASCII('S')

3. Accessing Characters Using Offsets

Since characters are stored sequentially, individual characters can be accessed using offsets.

Example:

MOV bl, [string + 2]

This accesses:

string + 2 → 'y'

Result:

bl = ASCII('y')

Another example:

MOV bh, [string + 1]

Result:

bh = ASCII('u')

4. Arrays in Assembly

Assembly arrays are simply sequences of values stored in contiguous memory.

Example:

list DB 43,77,9,18

Memory layout:

OffsetValue
list+043
list+177
list+29
list+318

Access examples:

MOV dl, [list]

Result:

dl = 43

Accessing elements by index:

MOV cl, [list + 3]

Result:

cl = 18

Another example:

MOV ch, [list + 2]

Result:

ch = 9

5. Using DW for Strings

Example:

string2 DW "Hello", 0

Directive DW means define word (2 bytes).

This means:

  • Each character now occupies 2 bytes instead of 1.

Memory usage:

"H" → 2 bytes
"e" → 2 bytes
"l" → 2 bytes
"l" → 2 bytes
"o" → 2 bytes
null terminator → 2 bytes

Total memory:

6 × 2 = 12 bytes

6. Reading a Word From Memory

Instruction:

MOV dx, [string2]
  • dx is a 16-bit register.
  • It reads 2 bytes from memory.

This loads the first character "H" into the register.

Summary

  • Code File
  • Assembly has no built-in string type; strings are simply arrays of bytes.
  • DB stores 1 byte per value.
  • Characters are stored using their ASCII values.
  • Arrays and strings are stored in contiguous memory locations.
  • Elements can be accessed using offsets (label + index).
  • DW allocates 2 bytes per element, which can also be used for wide characters.
  • The register size determines how many bytes are read from memory.