What makes a Prog8 program

This chapter describes a high level overview of the elements that make up a program. Details about the syntax can be found in the Syntax Reference chapter.

Elements of a program

Program

Consists of one or more modules.

Module

A file on disk with the .p8 suffix. It can contain directives and code blocks. Whitespace and indentation in the source code are arbitrary and can be mixed tabs or spaces. A module file can import other modules, including library modules. It should be saved in UTF-8 encoding.

Comments

Everything on the line after a semicolon ; is a comment and is ignored by the compiler. If the whole line is just a comment, this line will be copied into the resulting assembly source code for reference. There’s also a block-comment: everything surrounded with /* and */ is ignored and this can span multiple lines. This block comment is experimental for now: it may change or even be removed again in a future compiler version. The recommended way to comment out a bunch of lines remains to just bulk comment them individually with ;.

Directive

These are special instructions for the compiler, to change how it processes the code and what kind of program it creates. A directive is on its own line in the file, and starts with %, optionally followed by some arguments.

Code block

A block of actual program code. It has a starting address in memory, and defines a scope (also known as ‘namespace’). It contains variables and subroutines. More details about this below: Blocks, Scopes, and accessing Symbols.

Variable declarations

The data that the code works on is stored in variables (‘named values that can change’). The compiler allocates the required memory for them. There is no dynamic memory allocation. The storage size of all variables is fixed and is determined at compile time. Variable declarations tend to appear at the top of the code block that uses them, but this is not mandatory. They define the name and type of the variable, and its initial value. Prog8 supports a small list of data types, including special ‘memory mapped’ types that don’t allocate storage but instead point to a fixed location in the address space.

Code

These are the instructions that make up the program’s logic. Code can only occur inside a subroutine. There are different kinds of instructions (‘statements’ is a better name) such as:

  • value assignment

  • looping (for, while, do-until, repeat, unconditional jumps)

  • conditional execution (if - then - else, when, and conditional jumps)

  • subroutine calls

  • label definition

Subroutine

Defines a piece of code that can be called by its name from different locations in your code. It accepts parameters and can return a value (optional). It can define its own variables, and it is also possible to define subroutines within other subroutines. Nested subroutines can access the variables from outer scopes easily, which removes the need and overhead to pass everything via parameters all the time. Subroutines do not have to be declared in the source code before they can be called.

Label

This is a named position in your code where you can jump to from another place. You can jump to it with a jump statement elsewhere. It is also possible to use a subroutine call to a label (but without parameters and return value).

Scope

Also known as ‘namespace’, this is a named box around the symbols defined in it. This prevents name collisions (or ‘namespace pollution’), because the name of the scope is needed as prefix to be able to access the symbols in it. Anything inside the scope can refer to symbols in the same scope without using a prefix. There are three scope levels in Prog8:

  • global (no prefix), everything in a module file goes in here;

  • block;

  • subroutine, can be nested in another subroutine.

Even though modules are separate files, they are not separate scopes! Everything defined in a module is merged into the global scope. This is different from most other languages that have modules. The global scope can only contain blocks and some directives, while the others can contain variables and subroutines too. Some more details about how to deal with scopes and names is discussed below.

Blocks, Scopes, and accessing Symbols

Blocks are the top level separate pieces of code and data of your program. They have a starting address in memory and will be combined together into a single output program. They can only contain directives, variable declarations, subroutines and inline assembly code. Your actual program code can only exist inside these subroutines. (except the occasional inline assembly)

Here’s an example:

main $c000 {
    ; this is code inside the block...
}

The name of a block must be unique in your entire program. Be careful when importing other modules; blocks in your own code cannot have the same name as a block defined in an imported module or library.

The address can be used to place a block at a specific location in memory. Usually it is omitted, and the compiler will automatically choose the location (usually immediately after the previous block in memory). It must be >= $0200 (because $00$ff is the ZP and $100$1ff is the cpu stack).

Symbols are names defined in a certain scope. Inside the same scope, you can refer to them by their ‘short’ name directly. If the symbol is not found in the same scope, the enclosing scope is searched for it, and so on, up to the top level block, until the symbol is found. If the symbol was not found the compiler will issue an error message.

Subroutines create a new scope. All variables inside a subroutine are hoisted up to the scope of the subroutine they are declared in. Note that you can define nested subroutines in Prog8, and such a nested subroutine has its own scope! This also means that you have to use a fully qualified name to access a variable from a nested subroutine:

main {
    sub start() {
        sub nested() {
            ubyte counter
            ...
        }
        ...
        txt.print_ub(counter)                       ; Error: undefined symbol
        txt.print_ub(main.start.nested.counter)     ; OK
    }
}

Important

Emphasizing this once more: unlike most other programming languages, a new scope is not created inside for, while, repeat, and do-until statements, the if statement, and the branching conditionals. These all share the same scope from the subroutine they’re defined in. You can define variables in these blocks, but these will be treated as if they were defined in the subroutine instead.

Program Start and Entry Point

Your program must have a single entry point where code execution begins. The compiler expects a start subroutine in the main block for this, taking no parameters and having no return value.

As any subroutine, it has to end with a return statement (or a goto call):

main {
    sub start ()  {
        ; program entrypoint code here
        return
    }
}

The main module is always relocated to the start of your programs address space, and the start subroutine (the entrypoint) will be on the first address. This will also be the address that the BASIC loader program (if generated) calls with the SYS statement.

Variables and values

Variables are named values that can change during the execution of the program. They can be defined inside any scope (blocks, subroutines etc.) See Blocks, Scopes, and accessing Symbols. When declaring a numeric variable it is possible to specify the initial value, if you don’t want it to be zero. For other data types it is required to specify that initial value it should get. Values will usually be part of an expression or assignment statement:

12345                 ; integer number
$aa43                 ; hex integer number
%100101               ; binary integer number (% is also remainder operator so be careful)
false                 ; boolean false
-33.456e52            ; floating point number
"Hi, I am a string"   ; text string, encoded with default encoding
'a'                   ; byte value (ubyte) for the letter a
sc:"Alternate"        ; text string, encoded with c64 screencode encoding
sc:'a'                ; byte value of the letter a in c64 screencode encoding

byte  counter  = 42   ; variable of size 8 bits, with initial value 42

putting a variable in zeropage: If you add the @zp tag to the variable declaration, the compiler will prioritize this variable when selecting variables to put into zeropage (but no guarantees). If there are enough free locations in the zeropage, it will try to fill it with as much other variables as possible (before they will be put in regular memory pages). Use @requirezp tag to force the variable into zeropage, but if there is no more free space the compilation will fail. It’s possible to put strings, arrays and floats into zeropage too, however because Zp space is really scarce this is not advised as they will eat up the available space very quickly. It’s best to only put byte or word variables in zeropage.

Example:

byte   @zp  smallcounter = 42
uword  @requirezp  zppointer = $4000

shared tag: If you add the @shared tag to the variable declaration, the compiler will know that this variable is a prog8 variable shared with some assembly code elsewhere. This means that the assembly code can refer to the variable even if it’s otherwise not used in prog8 code itself. (usually, these kinds of ‘unused’ variables are optimized away by the compiler, resulting in an error when assembling the rest of the code). Example:

byte  @shared  assemblyVariable = 42

Integers

Integers are 8 or 16 bit numbers and can be written in normal decimal notation, in hexadecimal and in binary notation. There is no octal notation. You can use underscores to group digits to make long numbers more readable. A single character in single quotes such as 'a' is translated into a byte integer, which is the PETSCII value for that character.

Unsigned integers are in the range 0-255 for unsigned byte types, and 0-65535 for unsigned word types. The signed integers integers are in the range -128..127 for bytes, and -32768..32767 for words.

Attention

Doing math on signed integers can result in code that is a lot larger and slower than when using unsigned integers. Make sure you really need the signed numbers, otherwise stick to unsigned integers for efficiency.

Booleans

Booleans are a distinct type in Prog8 and can have only the values true or false. It can be casted to and from other integer types though where a nonzero integer is considered to be true, and zero is false. Logical expressions, comparisons and some other code tends to compile more efficiently if you explicitly use bool types instead of 0/1 integers. The in-memory representation of a boolean value is just a byte containing 0 or 1.

If you find that you need a whole bunch of boolean variables or perhaps even an array of them, consider using integer bit mask variable + bitwise operators instead. This saves a lot of memory and may be faster as well.

Floating point numbers

You can use underscores to group digits to make long numbers more readable.

Floats are stored in the 5-byte ‘MFLPT’ format that is used on CBM machines. Floating point support is available on the c64 and cx16 (and virtual) compiler targets. On the c64 and cx16, the rom routines are used for floating point operations, so on both systems the correct rom banks have to be banked in to make this work. Although the C128 shares the same floating point format, Prog8 currently doesn’t support using floating point on that system (because the c128 fp routines require the fp variables to be in another ram bank than the program, something Prog8 doesn’t do).

Also your code needs to import the floats library to enable floating point support in the compiler, and to gain access to the floating point routines. (this library contains the directive to enable floating points, you don’t have to worry about this yourself)

The largest 5-byte MFLPT float that can be stored is: 1.7014118345e+38 (negative: -1.7014118345e+38)

Arrays

Array types are also supported. They can be formed from a list of booleans, bytes, words, floats, or addresses of other variables (such as explicit address-of expressions, strings, or other array variables) - values in an array literal always have to be constants. Putting variables inside an array has to be done on a value-by-value basis. Here are some examples of arrays:

byte[10]  array                   ; array of 10 bytes, initially set to 0
byte[]  array = [1, 2, 3, 4]      ; initialize the array, size taken from value
ubyte[99] array = 255             ; initialize array with 99 times 255 [255, 255, 255, 255, ...]
byte[] array = 100 to 199         ; initialize array with [100, 101, ..., 198, 199]
str[] names = ["ally", "pete"]    ; array of string pointers/addresses (equivalent to array of uwords)
uword[] others = [names, array]   ; array of pointers/addresses to other arrays
bool[2] flags = [true, false]     ; array of two boolean values  (take up 1 byte each, like a byte array)

value = array[3]            ; the fourth value in the array (index is 0-based)
char = string[4]            ; the fifth character (=byte) in the string
char = string[-2]           ; the second-to-last character in the string (Python-style indexing from the end)

Note

Right now, the array should be small enough to be indexable by a single byte index. This means byte arrays should be <= 256 elements, word arrays <= 128 elements (256 if it’s a split array - see below), and float arrays <= 51 elements.

You can write out an array initializer list over several lines if you want to improve readability.

Note that the various keywords for the data type and variable type (byte, word, const, etc.) can’t be used as identifiers elsewhere. You can’t make a variable, block or subroutine with the name byte for instance.

It’s possible to assign a new array to another array, this will overwrite all elements in the original array with those in the value array. The number and types of elements have to match. For large arrays this is a slow operation because every element is copied over. It should probably be avoided.

Using the in operator you can easily check if a value is present in an array, example: if choice in [1,2,3,4] {....}

Arrays at a specific memory location: Using the memory-mapped syntax it is possible to define an array to be located at a specific memory location. For instance to reference the first 5 rows of the Commodore 64’s screen matrix as an array, you can define:

&ubyte[5*40]  top5screenrows = $0400

This way you can set the second character on the second row from the top like this:

top5screenrows[41] = '!'

Array indexing on a pointer variable: An uword variable can be used in limited scenarios as a ‘pointer’ to a byte in memory at a specific, dynamic, location. You can use array indexing on a pointer variable to use it as a byte array at a dynamic location in memory: currently this is equivalent to directly referencing the bytes in memory at the given index. In contrast to a real array variable, the index value can be the size of a word. Unlike array variables, you cannot use a negative index to count from the end, because the size of the array is unknown. See also Direct access to memory locations (‘peek’ and ‘poke’)

LSB/MSB split word arrays: For (u)word arrays, you can make the compiler layout the array in memory as two separate arrays, one with the LSBs and one with the MSBs of the word values. This is more efficient when storing and reading words from the array (the index can be used twice). Add the @split tag to the variable declaration to do this. In the assembly code, the array will be generated as two byte arrays namely name_lsb and name_msb. Note that the maximum length of a split word array is 256! (regular word arrays are limited to 128 elements).

Caution

Not all array operations are supported yet on “split word arrays”. The compiler may give an unpleasant error or crash when you hit such a case in your code. If this happens simply revert to a regular word array and please report the issue, so that more support can be added in the future where it is needed.

Strings

Strings are a sequence of characters enclosed in double quotes. The length is limited to 255 characters. They’re stored and treated much the same as a byte array, but they have some special properties because they are considered to be text. Strings (without encoding prefix) will be encoded (translated from ASCII/UTF-8) into bytes via the default encoding for the target platform. On the CBM machines, this is CBM PETSCII.

Alternative encodings can be specified with a encodingname: prefix to the string or character literal. The following encodings are currently recognised:

  • petscii PETSCII, the default encoding on CBM machines (c64, c128, cx16)

  • sc CBM-screencodes aka ‘poke’ codes (c64, c128, cx16)

  • iso iso-8859-15 text (supported on cx16)

So the following is a string literal that will be encoded into memory bytes using the iso encoding. It can be correctly displayed on the screen only if a iso-8859-15 charset has been activated first (the Commander X16 has this feature built in):

iso:"Käse, Straße"

You can concatenate two string literals using ‘+’, which can be useful to split long strings over separate lines. But remember that the length of the total string still cannot exceed 255 characters. A string literal can also be repeated a given number of times using ‘*’, where the repeat number must be a constant value. And a new string value can be assigned to another string, but no bounds check is done so be sure the destination string is large enough to contain the new value (it is overwritten in memory):

str string1 = "first part" + "second part"
str string2 = "hello!" * 10

string1 = string2
string1 = "new value"

There are several ‘escape sequences’ to help you put special characters into strings, such as newlines, quote characters themselves, and so on. The ones used most often are \\, \", \n, \r. For a detailed description of all of them and what they mean, read the syntax reference on strings.

Using the in operator you can easily check if a character is present in a string, example: if '@' in email_address {....} (however this gives no clue about the location in the string where the character is present, if you need that, use the string.find() library function instead) Caution: This checks all elements in the string with the length as it was initially declared. Even when a string was changed and is terminated early with a 0-byte early, the containment check with in will still look at all character positions in the initial string. Consider using string.find followed by if_cs (for instance) to do a “safer” search for a character in such strings (one that stops at the first 0 byte)

Hint

Strings/arrays and uwords (=memory address) can often be interchanged. An array of strings is actually an array of uwords where every element is the memory address of the string. You can pass a memory address to assembly functions that require a string as an argument. For regular assignments you still need to use an explicit & (address-of) to take the address of the string or array.

Hint

You can declare parameters and return values of subroutines as str, but in this case that is equivalent to declaring them as uword (because in this case, the address of the string is passed as argument or returned as value).

Note

Strings and their (im)mutability

String literals outside of a string variable’s initialization value, are considered to be “constant”, i.e. the string isn’t going to change during the execution of the program. The compiler takes advantage of this in certain ways. For instance, multiple identical occurrences of a string literal are folded into just one string allocation in memory. Examples of such strings are the string literals passed to a subroutine as arguments.

Strings that aren’t such string literals are considered to be unique, even if they are the same as a string defined elsewhere. This includes the strings assigned to a string variable in its declaration! These kind of strings are not deduplicated and are just copied into the program in their own unique part of memory. This means that it is okay to treat those strings as mutable; you can safely change the contents of such a string without destroying other occurrences (as long as you stay within the size of the allocated string!)

Special types: const and memory-mapped

When using const, the value of the ‘variable’ cannot be changed; it has become a compile-time constant value instead. You’ll have to specify the initial value expression. This value is then used by the compiler everywhere you refer to the constant (and no memory is allocated for the constant itself). Onlythe simple numeric types (byte, word, float) can be defined as a constant. If something is defined as a constant, very efficient code can usually be generated from it. Variables on the other hand can’t be optimized as much, need memory, and more code to manipulate them. Note that a subset of the library routines in the math, string and floats modules are recognised in compile time expressions. For example, the compiler knows what math.sin8u(12) is and replaces it with the computed result.

When using & (the address-of operator but now applied to a datatype), the variable will point to specific location in memory, rather than being newly allocated. The initial value (mandatory) must be a valid memory address. Reading the variable will read the given data type from the address you specified, and setting the variable will directly modify that memory location(s):

const  byte  max_age = 2000 - 1974      ; max_age will be the constant value 26
&word  SCREENCOLORS = $d020             ; a 16-bit word at the address $d020-$d021

Direct access to memory locations (‘peek’ and ‘poke’)

Normally memory locations are accessed by a memory mapped name, such as cbm.BGCOL0 that is defined as the memory mapped address $d021 (on the c64 target).

If you want to access a memory location directly (by using the address itself or via an uword pointer variable), without defining a memory mapped location, you can do so by enclosing the address in @(...):

color = @($d020)  ; set the variable 'color' to the current c64 screen border color ("peek(53280)")
@($d020) = 0      ; set the c64 screen border to black ("poke 53280,0")
@(vic+$20) = 6    ; you can also use expressions to 'calculate' the address

This is the official syntax to ‘dereference a pointer’ as it is often named in other languages. You can actually also use the array indexing notation for this. It will be silently converted into the direct memory access expression as explained above. Note that unlike regular arrays, the index is not limited to an ubyte value. You can use a full uword to index a pointer variable like this:

pointervar[999] = 0     ; set memory byte to zero at location pointervar + 999.

Converting types into other types

Sometimes you need an unsigned word where you have an unsigned byte, or you need some other type conversion. Many type conversions are possible by just writing as <type> at the end of an expression:

uword  uw = $ea31
ubyte  ub = uw as ubyte     ; ub will be $31, identical to lsb(uw)
float  f = uw as float      ; f will be 59953, but this conversion can be omitted in this case
word   w = uw as word       ; w will be -5583 (simply reinterpret $ea31 as 2-complement negative number)
f = 56.777
ub = f as ubyte             ; ub will be 56

Sometimes it is a straight reinterpretation of the given value as being of the other type, sometimes an actual value conversion is done to convert it into the other type. Try to avoid those type conversions as much as possible.

Initial values across multiple runs of the program

When declaring values with an initial value, this value will be set into the variable each time the program reaches the declaration again. This can be in loops, multiple subroutine calls, or even multiple invocations of the entire program. If you omit the initial value, zero will be used instead.

This only works for simple types, and not for string variables and arrays. It is assumed these are left unchanged by the program; they are not re-initialized on a second run. If you do modify them in-place, you should take care yourself that they work as expected when the program is restarted. (This is an optimization choice to avoid having to store two copies of every string and array)

Loops

The for-loop is used to let a variable iterate over a range of values. Iteration is done in steps of 1, but you can change this. The loop variable must be declared separately as byte or word earlier, so that you can reuse it for multiple occasions. Iterating with a floating point variable is not supported. If you want to loop over a floating-point array, use a loop with an integer index variable instead. If the from value is already outside of the loop range, the whole for loop is skipped.

The while-loop is used to repeat a piece of code while a certain condition is still true. The do–until loop is used to repeat a piece of code until a certain condition is true. The repeat loop is used as a short notation of a for loop where the loop variable doesn’t matter and you’re only interested in the number of iterations. (without iteration count specified it simply loops forever). A repeat loop will result in the most efficient code generated so use this if possible.

You can also create loops by using the goto statement, but this should usually be avoided.

Breaking out of a loop prematurely is possible with the break statement, immediately continue into the next cycle of the loop with the continue statement. (These are just shorthands for a goto + a label)

The unroll loop is not really a loop, but looks like one. It actually duplicates the statements in its block on the spot by the given number of times. It’s meant to “unroll loops” - trade memory for speed by avoiding the actual repeat loop counting code. Only simple statements are allowed to be inside an unroll loop (assignments, function calls etc.).

Attention

The value of the loop variable after executing the loop is undefined - you cannot rely on it to be the last value in the range for instance! The value of the variable should only be used inside the for loop body. (this is an optimization issue to avoid having to deal with mostly useless post-loop logic to adjust the loop variable’s value)

Conditional Execution

if statements

Conditional execution means that the flow of execution changes based on certain conditions, rather than having fixed gotos or subroutine calls:

if xx==5 {
    yy = 99
    zz = 42
} else {
    aa = 3
    bb = 9
}

if xx==5
    yy = 42
else if xx==6
    yy = 43
else
    yy = 44

if aa>4 goto some_label

if xx==3  yy = 4

if xx==3  yy = 4 else  aa = 2

Conditional jumps (if condition goto label) are compiled using 6502’s branching instructions (such as bne and bcc) so the rather strict limit on how far it can jump applies. The compiler itself can’t figure this out unfortunately, so it is entirely possible to create code that cannot be assembled successfully. Thankfully the 64tass assembler that is used has the option to automatically convert such branches to their opposite + a normal jmp. This is slower and takes up more space and you will get warning printed if this happens. You may then want to restructure your branches (place target labels closer to the branch, or reduce code complexity).

There is a special form of the if-statement that immediately translates into one of the 6502’s branching instructions. This allows you to write a conditional jump or block execution directly acting on the current values of the CPU’s status register bits. The eight branching instructions of the CPU each have an if-equivalent (and there are some easier to understand aliases):

condition

meaning

if_cs

if carry status is set

if_cc

if carry status is clear

if_vs

if overflow status is set

if_vc

if overflow status is clear

if_eq / if_z

if result is equal to zero

if_ne / if_nz

if result is not equal to zero

if_pl / if_pos

if result is ‘plus’ (>= zero)

if_mi / if_neg

if result is ‘minus’ (< zero)

So if_cc goto target will directly translate into the single CPU instruction BCC target.

Caution

These special if_XX branching statements are only useful in certain specific situations where you are certain that the status register (still) contains the correct status bits. This is not always the case after a function call or other operations! If in doubt, check the generated assembly code!

Note

For now, the symbols used or declared in the statement block(s) are shared with the same scope the if statement itself is in. Maybe in the future this will be a separate nested scope, but for now, that is only possible when defining a subroutine.

when statement (‘jump table’)

Instead of writing a bunch of sequential if-elseif statements, it is more readable to use a when statement. (It will also result in greatly improved assembly code generation) Use a when statement if you have a set of fixed choices that each should result in a certain action. It is possible to combine several choices to result in the same action:

when value {
    4 -> txt.print("four")
    5 -> txt.print("five")
    10,20,30 -> {
        txt.print("ten or twenty or thirty")
    }
    else -> txt.print("don't know")
}

The when-value can be any expression but the choice values have to evaluate to compile-time constant integers (bytes or words). They also have to be the same datatype as the when-value, otherwise no efficient comparison can be done.

Note

Instead of chaining several value equality checks together using or (ex.: if x==1 or xx==5 or xx==9), consider using a when statement or in containment check instead. These are more efficient.

Assignments

Assignment statements assign a single value to a target variable or memory location. Augmented assignments (such as aa += xx) are also available, but these are just shorthands for normal assignments (aa = aa + xx).

Only variables of type byte, word and float can be assigned a new value. It’s not possible to set a new value to string or array variables etc, because they get allocated a fixed amount of memory which will not change. (You can change the value of elements in a string or array though).

It is possible to “chain” assignments: x = y = z = 42, this is just a shorthand for the three individual assignments with the same value 42.

Attention

Data type conversion (in assignments): When assigning a value with a ‘smaller’ datatype to variable with a ‘larger’ datatype, the value will be automatically converted to the target datatype: byte –> word –> float. So assigning a byte to a word variable, or a word to a floating point variable, is fine. The reverse is not true: it is not possible to assign a value of a ‘larger’ datatype to a variable of a smaller datatype without an explicit conversion. Otherwise you’ll get an error telling you that there is a loss of precision. You can use builtin functions such as round and lsb to convert to a smaller datatype, or revert to integer arithmetic.

Expressions

Expressions tell the program to calculate something. They consist of values, variables, operators such as + and -, function calls, type casts, or other expressions. Here is an example that calculates to number of seconds in a certain time period:

num_hours * 3600 + num_minutes * 60 + num_seconds

Long expressions can be split over multiple lines by inserting a line break before or after an operator:

num_hours * 3600
 + num_minutes * 60
 + num_seconds

In most places where a number or other value is expected, you can use just the number, or a constant expression. If possible, the expression is parsed and evaluated by the compiler itself at compile time, and the (constant) resulting value is used in its place. Expressions that cannot be compile-time evaluated will result in code that calculates them at runtime. Expressions can contain procedure and function calls. There are various built-in functions that can be used in expressions (see Built-in Functions). You can also reference identifiers defined elsewhere in your code.

Read the Syntax Reference chapter for all details on the available operators and kinds of expressions you can write.

Note

Order of evaluation:

The order of evaluation of expression operands is unspecified and should not be relied upon. There is no guarantee of a left-to-right or right-to-left evaluation. But don’t confuse this with operator precedence order (multiplication comes before addition etcetera).

Attention

Floating point values used in expressions:

When a floating point value is used in a calculation, the result will be a floating point, and byte or word values will be automatically converted into floats in this case. The compiler will issue a warning though when this happens, because floating point calculations are very slow and possibly unintended!

Calculations with integer variables will not result in floating point values. if you divide two integer variables say 32500 and 99 the result will be the integer floor division (328) rather than the floating point result (328.2828282828283). If you need the full precision, you’ll have to make sure at least the first operand is a floating point. You can do this by using a floating point value or variable, or use a type cast. When the compiler can calculate the result during compile-time, it will try to avoid loss of precision though and gives an error if you may be losing a floating point result.

Arithmetic and Logical expressions

Arithmetic expressions are expressions that calculate a numeric result (integer or floating point). Many common arithmetic operators can be used and follow the regular precedence rules. Logical expressions are expressions that calculate a boolean result: true or false (which in reality are just a 1 or 0 integer value). When using variables of the type bool, logical expressions will compile more efficiently than when you’re using regular integer type operands (because these have to be converted to 0 or 1 every time) Prog8 applies short-circuit aka McCarthy evaluation for and and or on boolean expressions.

You can use parentheses to group parts of an expression to change the precedence. Usually the normal precedence rules apply (* goes before + etc.) but subexpressions within parentheses will be evaluated first. So (4 + 8) * 2 is 24 and not 20, and (true or false) and false is false instead of true.

Attention

calculations keep their datatype even if the target variable is larger: When you do calculations on a BYTE type, the result will remain a BYTE. When you do calculations on a WORD type, the result will remain a WORD. For instance:

byte b = 44
word w = b*55   ; the result will be 116! (even though the target variable is a word)
w *= 999        ; the result will be -15188  (the multiplication stays within a word, but overflows)

The compiler does NOT warn about this! It’s doing this for performance reasons - so you won’t get sudden 16 bit (or even float) calculations where you needed only simple fast byte arithmetic. If you do need the extended resulting value, cast at least one of the operands explicitly to the larger datatype. For example:

byte b = 44
w = (b as word)*55
w = b*(55 as word)

Subroutines

Defining a subroutine

Subroutines are parts of the code that can be repeatedly invoked using a subroutine call from elsewhere. Their definition, using the sub statement, includes the specification of the required parameters and return value. Subroutines can be defined in a Block, but also nested inside another subroutine. Everything is scoped accordingly. With asmsub you can define a low-level subroutine that is implemented directly in assembly and takes parameters directly in registers.

Trivial asmsub routines can be tagged as inline to tell the compiler to copy their code in-place to the locations where the subroutine is called, rather than inserting an actual call and return to the subroutine. This may increase code size significantly and can only be used in limited scenarios, so YMMV. Note that the routine’s code is copied verbatim into the place of the subroutine call in this case, so pay attention to any jumps and rts instructions in the inlined code! Inlining regular Prog8 subroutines is at the discretion of the compiler.

Calling a subroutine

The arguments in parentheses after the function name, should match the parameters in the subroutine definition. If you want to ignore a return value of a subroutine, you should prefix the call with the void keyword. Otherwise the compiler will issue a warning about discarding a result value.

Note

Order of evaluation:

The order of evaluation of arguments to a single function call is unspecified and should not be relied upon. There is no guarantee of a left-to-right or right-to-left evaluation of the call arguments.

Caution

Note that due to the way parameters are processed by the compiler, subroutines are non-reentrant. This means you cannot create recursive calls. If you do need a recursive algorithm, you’ll have to hand code it in embedded assembly for now, or rewrite it into an iterative algorithm. Also, subroutines used in the main program should not be used from an IRQ handler. This is because the subroutine may be interrupted, and will then call itself from the IRQ handler. Results are then undefined because the variables will get overwritten.

Built-in Functions

There’s a set of predefined functions in the language. These are fixed and can’t be redefined in user code. You can use them in expressions and the compiler will evaluate them at compile-time if possible.

Math

abs (x)

Returns the absolute value of a number (integer or floating point).

min (x, y)

Returns the smallest of x and y. Supported for integer types only, for floats use floats.minf() instead.

max (x, y)

Returns the largest of x and y. Supported for integer types only, for floats use floats.maxf() instead.

clamp (value, minimum, maximum)

Returns the value restricted to the given minimum and maximum. Supported for integer types only, for floats use floats.clampf() instead.

sgn (x)

Get the sign of the value (integer or floating point). The result is a byte: -1, 0 or 1 (negative, zero, positive).

sqrt (w)

Returns the square root of the number. Supports unsigned integer (result is ubyte) and floating point numbers. To do the reverse - squaring a number - just write x*x.

divmod (dividend, divisor, quotient, remainder)

Performs division only once and returns both quotient and remainder in a single call, where using ‘/’ and ‘%’ separately would perform the division operation twice. All values are ubytes or all are uwords. The last two arguments must be variables to receive the quotient and remainder results, respectively.

Array operations

any (x)

true if any of the values in the array value x is ‘true’ (not zero), else false.

all (x)

true if all of the values in the array value x are ‘true’ (not zero), else false.

len (x)

Number of values in the array value x, or the number of characters in a string (excluding the 0-byte). Note: this can be different from the number of bytes in memory if the datatype isn’t a byte. See sizeof(). Note: lengths of strings and arrays are determined at compile-time! If your program modifies the actual length of the string during execution, the value of len(s) may no longer be correct! (use the string.length routine if you want to dynamically determine the length by counting to the first 0-byte)

reverse (array)

Reverse the values in the array (in-place). Can be used after sort() to sort an array in descending order.

sort (array)

Sort the array in ascending order (in-place) Supported are arrays of bytes or word values. Sorting a floating-point array is not supported right now, as a general sorting routine for this will be extremely slow. Either build one yourself or find another solution that doesn’t require sorting. Finally, note that sorting an array with strings in it will not do what you might think; it considers the array as just an array of integer words and sorts the string pointers accordingly. Sorting strings alphabetically has to be programmed yourself if you need it.

Miscellaneous

cmp (x,y)

Compare the integer value x to integer value y. Doesn’t return a value or boolean result, only sets the processor’s status bits! You can use a conditional jumps (if_cc etcetera) to act on this. Normally you should just use a comparison expression (x < y)

lsb (x)

Get the least significant byte of the word x. Equivalent to the cast “x as ubyte”.

msb (x)

Get the most significant byte of the word x.

mkword (msb, lsb)

Efficiently create a word value from two bytes (the msb and the lsb). Avoids multiplication and shifting. So mkword($80, $22) results in $8022.

Note

The arguments to the mkword() function are in ‘natural’ order that is first the msb then the lsb. Don’t get confused by how the system actually stores this 16-bit word value in memory (which is in little-endian format, so lsb first then msb)

peek (address)

same as @(address) - reads the byte at the given address in memory.

peekw (address)

reads the word value at the given address in memory. Word is read as usual little-endian lsb/msb byte order.

peekf (address)

reads the float value at the given address in memory. On CBM machines, this reads 5 bytes.

poke (address, value)

same as @(address)=value - writes the byte value at the given address in memory.

pokew (address, value)

writes the word value at the given address in memory, in usual little-endian lsb/msb byte order.

pokef (address, value)

writes the float value at the given address in memory. On CBM machines, this writes 5 bytes.

pokemon (address, value)

Like poke(), but also returns the previous value in the given address. Also doesn’t have anything to do with a certain video game.

rol (x)

Rotate the bits in x (byte or word) one position to the left. This uses the CPU’s rotate semantics: bit 0 will be set to the current value of the Carry flag, while the highest bit will become the new Carry flag value. (essentially, it is a 9-bit or 17-bit rotation) Modifies in-place, doesn’t return a value (so can’t be used in an expression). You can rol a memory location directly by using the direct memory access syntax, so like rol(@($5000))

rol2 (x)

Like rol but now as 8-bit or 16-bit rotation. It uses some extra logic to not consider the carry flag as extra rotation bit. Modifies in-place, doesn’t return a value (so can’t be used in an expression). You can rol a memory location directly by using the direct memory access syntax, so like rol2(@($5000))

ror (x)

Rotate the bits in x (byte or word) one position to the right. This uses the CPU’s rotate semantics: the highest bit will be set to the current value of the Carry flag, while bit 0 will become the new Carry flag value. (essentially, it is a 9-bit or 17-bit rotation) Modifies in-place, doesn’t return a value (so can’t be used in an expression). You can ror a memory location directly by using the direct memory access syntax, so like ror(@($5000))

ror2 (x)

Like ror but now as 8-bit or 16-bit rotation. It uses some extra logic to not consider the carry flag as extra rotation bit. Modifies in-place, doesn’t return a value (so can’t be used in an expression). You can ror a memory location directly by using the direct memory access syntax, so like ror2(@($5000))

setlsb (x, value)

Sets the least significant byte of word variable x to a new value. Leaves the MSB untouched.

setmsb (x, value)

Sets the most significant byte of word variable x to a new value. Leaves the LSB untouched.

sizeof (name) ; sizeof (number)

Number of bytes that the object ‘name’, or the number ‘number’ occupies in memory. This is a constant determined by the data type of the object. For instance, for a variable of type uword, the sizeof is 2. For an 10 element array of floats, it is 50 (on the C64, where a float is 5 bytes). Note: usually you will be interested in the number of elements in an array, use len() for that.

memory (name, size, alignment)

Returns the address of the first location of a statically “reserved” block of memory of the given size in bytes, with the given name. The block is uninitialized memory, it is not set to zero! If you specify an alignment value >1, it means the block of memory will be aligned to such a dividable address in memory, for instance an alignment of $100 means the memory block is aligned on a page boundary, and $2 means word aligned (even addresses). Requesting the address of such a named memory block again later with the same name, will result in the same address as before. When reusing blocks in that way, it is required that the size argument is the same, otherwise you’ll get a compilation error. This routine can be used to “reserve” parts of the memory where a normal byte array variable would not suffice; for instance if you need more than 256 consecutive bytes. The return value is just a simple uword address so it cannot be used as an array in your program. You can only treat it as a pointer or use it in inline assembly.

call (address) -> uword

Calls a subroutine given by its memory address. You cannot pass arguments directly, although it is ofcourse possible to do this via the global cx16.r0... registers for example. It is assumed the subroutine returns a word value (in AY), if it does not, just add void to the call to ignore the result value. This function effectively creates an “indirect JSR” if you use it on a uword pointer variable. But because it doesn’t handle bank switching etcetera by itself, it is a lot faster than callfar. And it works on other systems than just the Commander X16.

callfar (bank, address, argumentword) -> uword ; NOTE: specific to cx16 target for now

Calls an assembly routine in another bank on the Commander X16 (using its JSRFAR routine) Be aware that ram OR rom bank may be changed depending on the address it jumps to! The argumentword will be loaded into the A+Y registers before calling the routine. The uword value that the routine returns in the A+Y registers, will be returned. NOTE: this routine is very inefficient, so don’t use it to call often. Set the bank yourself or even write a custom tailored trampoline routine if you need to. Or use call if you can.

syscall (callnr), syscall1 (callnr, arg), syscall2 (callnr, arg1, arg2), syscall3 (callnr, arg1, arg2, arg3)

Functions for doing a system call on targets that support this. Currently no actual target uses this though except, possibly, the experimental code generation target! The regular 6502 based compiler targets just use a subroutine call to asmsub Kernal routines at specific memory locations. So these builtin function calls are not useful yet except for experimentation in new code generation targets.

rsave

Saves all registers including status (or only X) on the stack Note: the 16 bit ‘virtual’ registers of the Commander X16 are not saved, but you can use cx16.save_virtual_registers() for that.

rrestore

Restore all registers including status (or only X) back from the cpu hardware stack Note: the 16 bit ‘virtual’ registers of the Commander X16 are not restored, but you can use cx16.restore_virtual_registers() for that.

Library routines

There are many routines available in the compiler libraries. Some are used internally by the compiler as well.

The most important ones can be found in the Library modules chapter.

There’s too many to list here, just have a look through the source code of the library modules to see what’s there. (They can be found in the compiler/res directory) The example programs also use a small set of the library routines, you can study their source code to see how they might be used.