--- title: "Creating a Sinclair BASIC interpreter" date: 2023-01-03T17:24:00 slug: creating-a-sinclair-basic-interpreter --- Given my new website design, I figured I'd also have a go at making an interpreter for Sinclair BASIC so that I can run my own ZX Spectrum programs. I've got a few aims: * Create something that can properly parse Sinclair BASIC * Run the interpreted software in the browser * Allow user input What this isn't supposed to be: * A perfectly-performant implementation * An emulator I'm not a Rust developer, and I'm only learning myself, so everything I write here will be suboptimal. If anyone reading wants to give me some pointers (pun intended), I'd be forever grateful. I'm going to use the [ZX Basic Instruction Manual](https://worldofspectrum.org/ZXBasicManual/) as my main reference for this project. Source code for the project is available on [Github](https://github.com/lewisdaleuk/basic-interpreter) ## Getting started I'm using Rust for this, so I create a new Cargo project: ```bash cargo new basic-interpreter ``` And I know I'm going to need to parse input, so I'm going to use [nom](https://docs.rs/nom/latest/nom/) for parsing: ```bash cargo add nom ``` ## Hello, World Okay, so to begin with we're going to implement the simplest program we can: Hello World. It'll be a single line program that just prints out the string. Here's the program in question: ```basic 10 PRINT "Hello, World" ``` There are three parts to this statement: 1. The line number - this is in theory optional, but we'll handle that later 2. The command, in this case `PRINT` 3. The input. There is some different variations of input, but for now we're just going to handle single strings ### Parsing Okay so let's get started with our parser! We'll start by writing a test for a line to make sure it parses okay: ```rust #[test] fn it_parses_a_print_command() { let input = "10 PRINT \"Hello, world\""; let expected = (10, super::Command::Print(String::from("Hello, world"))); let (_, result) = super::parse_line(input).unwrap(); assert_eq!(expected, result); } ``` And let's create our types: ```rust pub type Line = (u32, Command); #[derive(Debug, PartialEq, Eq)] pub enum Command { Print(String), None } ``` To start with, we'll extract the line number: ```rust pub fn parse_line(line: &str) -> IResult<&str, Line> { let (i, line_number) = terminated(ccu32, tag(" "))(line)?; Ok((line, (line_number, Command::None))) } ``` Then we need to parse the command: ```rust fn read_string(i: &str) -> IResult<&str, &str> { take_until("\"")(i) } fn parse_command(i: &str) -> IResult<&str, Command> { let (i, (command, _)) = tuple((take_until(" "), tag(" ")))(i)?; let (i, cmd) = match command { "PRINT" => map(delimited(tag("\""), read_string, tag("\"")), Command::Print)(i)?, _ => (i , Command::None) }; Ok((i, cmd)) } pub fn parse_line(line: &str) -> IResult<&str, Line> { let (i, line_number) = terminated(ccu32, tag(" "))(line)?; let (i, command) = parse_command(i)?; Ok((i, (line_number, command))) } ``` Finally, let's write some code to quickly run our program: ```rust use std::fs; mod basic; fn main() { let file = fs::read_to_string("./src/inputs/hello_world.bas").unwrap(); let lines = file.lines().next().unwrap(); let (_, (_, command)) = basic::parse_line(lines).unwrap(); match command { basic::Command::Print(input) => { println!("{}", input); } _ => { panic!("Command not recognised"); } }; } ``` And we can run it: ```bash $ cargo run Compiling basic-interpreter v0.1.0 (/Users/lewis/development/personal/basic-interpreter) Finished dev [unoptimized + debuginfo] target(s) in 0.51s Running `target/debug/basic-interpreter` Hello, world ``` Cool, that works! ### Escaped characters Okay, but what about if I change my program to print quote characters (`"`)?. To do this, we need to escape the strings: ```basic 10 PRINT "Hello, \"World\"" ``` Which we would expect to result in: ```bash Hello, "World" ``` However because we're using `take_until`, our parser stops at the first escaped quote, resulting in: ```bash Hello, \ ``` To fix this, we need to use the `escaped_transform` parser: ```rust fn read_string(i: &str) -> IResult<&str, &str> { delimited( tag("\""), escaped_transform( none_of("\\\""), '\', alt((value("\\", tag("\\")), value("\"", tag("\"")))), ), tag("\""), )(i) } ``` What we're saying here is accept any character that doesn't match either `\` or `"` (`none_of("\\\"")`), where `\` is our escape character. Finally, we match escaped quote characters and escaped backslashes and un-escape them so that they print properly (otherwise our output will include escape characters when printed). ## Basic looping Alright, next is everybody's favourite command: `GO TO`, the lets us jump to a a different part of the program.: Here's a short program using our two commands that will print "Hello World" infintely: ```basic 10 PRINT "Hello World" 20 GO TO 10 ``` ### Parsing The first thing that leaps out to me from this command is that `GO TO` contains a space. That won't work with our current parser, which reads a string until it meets a space. Instead, we should try and be more specific: ```rust #[derive(Debug, PartialEq, Eq)] pub enum Command { Print(String), GoTo(usize), None, } fn match_command(i: &str) -> IResult<&str, &str> { alt(( tag("PRINT"), tag("GO TO") ))(i) } fn parse_command(i: &str) -> IResult<&str, Command> { let (i, command): (&str, &str) = match_command(i).unwrap_or((i, "")); println!("{}", command); let (i, _) = tag(" ")(i)?; let (i, cmd) = match command { "PRINT" => map(read_string, Command::Print)(i)?, "GO TO" => map(ccu64, |line| Command::GoTo(line as usize))(i)?, _ => (i, Command::None), }; Ok((i, cmd)) } ``` ### Building a program For `GO TO` to function, we need a structure to actually store our program. We need to: * Store a command as a line * Easily move to the next line * Search a program for a line by number (A real compiler would do lots of clever things here that would enable it to drop unreachable code and optimise things, but that's not what we're here for). We... might need a Linked List. They're pretty notoriously a headache in Rust due to ownership rules - but we can use `Box` to help mitigate this: ```rust pub enum Node { None, Link { item: Line, next: Box } } ``` We'll need to add the `Copy` trait to our `Command` enum for this to work: ```rust #[derive(Debug, PartialEq, Eq, Copy)] pub enum Command { Print(String), GoTo(usize), None, } ``` And then implement a (very basic) Linked List: ```rust #[derive(Debug, PartialEq, Eq, Clone)] pub enum Node { None, Link { item: Line, next: Box } } impl Node { fn push(&mut self, val: Line) { *self = match self { Self::Link { item, next } => { next.push(val); Self::Link { item: item.clone(), next: next.clone() } }, Self::None => Self::Link { item: val, next: Box::new(Self::None) } } } } ``` We also want to be able to find a line, so we'll write a simple `find_line` function too: ```rust fn find_line(&self, line: usize) -> Option { if let Self::Link { item, next } = self { if item.0 == line { Some(self.clone()) } else { next.find(line) } } else { None } } ``` Finally, build a parser to read every line and store it in a list of nodes: ```rust pub fn read_program(i: &str) -> IResult<&str, Node> { let (i, lines) = separated_list0(tag("\n"), parse_line)(i)?; let mut node = Node::None; for line in lines.iter() { node.push(line.clone()); } Ok((i, node)) } ``` ### Running the program We have a list of instructions. Now we need to martial them and provide an interface to run them. To do this, I've created a `Program` struct that holds a reference to the complete program, and a cursor for the instruction currently being executed: ```rust #[derive(Debug, PartialEq, Eq, Clone)] pub struct Program { nodes: Node, current: Node, } ``` I'm also going to implement `Iterator` for the struct, so that we can easily loop over all of the instructions: ```rust impl Iterator for Program { type Item = Node; fn next(&mut self) -> Option { let curr = self.current.clone(); match &self.current { Node::Link { item: _, next } => { self.current = *next.clone(); Some(curr) } Node::None => None, } } } ``` Then we add an `execute` function, as well as a function to jump to a line: ```rust impl Program { pub fn new(node: Node) -> Self { Program { nodes: node.clone(), current: node, } } pub fn to_line(&mut self, line: usize) { if let Some(node) = self.nodes.find_line(line) { self.current = node; } else { panic!("Cannot jump to line {}, it does not exist", line); } } pub fn execute(&mut self) { let mut iter = self.clone(); while let Some(node) = iter.next() { match node { Node::Link { item, next } => { match item.1 { Command::Print(line) => println!("{}", line), Command::GoTo(line) => iter.to_line(line), _ => panic!("Unrecognised command") } }, _ => () }; } } } ``` Now, we can run a new sample program, using the provided struct: ```rust fn main() { let file = fs::read_to_string("./inputs/simple_program.bas").unwrap(); let (_, mut program) = basic::read_program(&file).unwrap(); program.execute(); } ``` ``` $ cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/basic-interpreter` Hello World Hello World Hello World Hello World Hello World (truncated ∞ lines) ``` I think that's where I'll leave this (surprisingly long) post. This was loads of fun, to be honest. I think for my next post I'll be adding some logic statements (`IF`) - aim to get this as close to a functioning piece of software as quickly as possible, and then work on some of the more fun commands. I'm also going to refactor things a bit, because my `basic.rs` file got quite messy towards the end.