406 lines
11 KiB
Markdown
406 lines
11 KiB
Markdown
---
|
||
title: "Creating a Sinclair BASIC interpreter"
|
||
date: 2023-01-03T17:24:00
|
||
slug: creating-a-sinclair-basic-interpreter
|
||
|
||
---
|
||
Given my new website design, I figured I'd also have a go at making an interpreter for Sinclair BASIC so that I can run my own ZX Spectrum programs. I've got a few aims:
|
||
|
||
* Create something that can properly parse Sinclair BASIC
|
||
* Run the interpreted software in the browser
|
||
* Allow user input
|
||
|
||
What this isn't supposed to be:
|
||
|
||
* A perfectly-performant implementation
|
||
* An emulator
|
||
|
||
I'm not a Rust developer, and I'm only learning myself, so everything I write here will be suboptimal. If anyone reading wants to give me some pointers (pun intended), I'd be forever grateful.
|
||
|
||
I'm going to use the [ZX Basic Instruction Manual](https://worldofspectrum.org/ZXBasicManual/) as my main reference for this project.
|
||
|
||
Source code for the project is available on [Github](https://github.com/lewisdaleuk/basic-interpreter)
|
||
|
||
## Getting started
|
||
|
||
I'm using Rust for this, so I create a new Cargo project:
|
||
|
||
```bash
|
||
cargo new basic-interpreter
|
||
```
|
||
|
||
And I know I'm going to need to parse input, so I'm going to use [nom](https://docs.rs/nom/latest/nom/) for parsing:
|
||
|
||
```bash
|
||
cargo add nom
|
||
```
|
||
|
||
## Hello, World
|
||
|
||
Okay, so to begin with we're going to implement the simplest program we can: Hello World. It'll be a single line program that just prints out the string. Here's the program in question:
|
||
|
||
```basic
|
||
10 PRINT "Hello, World"
|
||
```
|
||
|
||
There are three parts to this statement:
|
||
|
||
1. The line number - this is in theory optional, but we'll handle that later
|
||
2. The command, in this case `PRINT`
|
||
3. The input. There is some different variations of input, but for now we're just going to handle single strings
|
||
|
||
### Parsing
|
||
|
||
Okay so let's get started with our parser! We'll start by writing a test for a line to make sure it parses okay:
|
||
|
||
```rust
|
||
#[test]
|
||
fn it_parses_a_print_command() {
|
||
let input = "10 PRINT \"Hello, world\"";
|
||
let expected = (10, super::Command::Print(String::from("Hello, world")));
|
||
|
||
let (_, result) = super::parse_line(input).unwrap();
|
||
assert_eq!(expected, result);
|
||
}
|
||
```
|
||
|
||
And let's create our types:
|
||
|
||
```rust
|
||
pub type Line = (u32, Command);
|
||
|
||
#[derive(Debug, PartialEq, Eq)]
|
||
pub enum Command {
|
||
Print(String),
|
||
None
|
||
}
|
||
```
|
||
|
||
To start with, we'll extract the line number:
|
||
|
||
```rust
|
||
pub fn parse_line(line: &str) -> IResult<&str, Line> {
|
||
let (i, line_number) = terminated(ccu32, tag(" "))(line)?;
|
||
|
||
Ok((line, (line_number, Command::None)))
|
||
}
|
||
```
|
||
|
||
Then we need to parse the command:
|
||
|
||
```rust
|
||
fn read_string(i: &str) -> IResult<&str, &str> {
|
||
take_until("\"")(i)
|
||
}
|
||
|
||
fn parse_command(i: &str) -> IResult<&str, Command> {
|
||
let (i, (command, _)) = tuple((take_until(" "), tag(" ")))(i)?;
|
||
|
||
let (i, cmd) = match command {
|
||
"PRINT" => map(delimited(tag("\""), read_string, tag("\"")), Command::Print)(i)?,
|
||
_ => (i , Command::None)
|
||
};
|
||
|
||
Ok((i, cmd))
|
||
}
|
||
|
||
pub fn parse_line(line: &str) -> IResult<&str, Line> {
|
||
let (i, line_number) = terminated(ccu32, tag(" "))(line)?;
|
||
let (i, command) = parse_command(i)?;
|
||
Ok((i, (line_number, command)))
|
||
}
|
||
```
|
||
|
||
Finally, let's write some code to quickly run our program:
|
||
|
||
```rust
|
||
use std::fs;
|
||
|
||
mod basic;
|
||
|
||
fn main() {
|
||
let file = fs::read_to_string("./src/inputs/hello_world.bas").unwrap();
|
||
let lines = file.lines().next().unwrap();
|
||
let (_, (_, command)) = basic::parse_line(lines).unwrap();
|
||
match command {
|
||
basic::Command::Print(input) => {
|
||
println!("{}", input);
|
||
}
|
||
_ => {
|
||
panic!("Command not recognised");
|
||
}
|
||
};
|
||
}
|
||
```
|
||
|
||
And we can run it:
|
||
|
||
```bash
|
||
$ cargo run
|
||
Compiling basic-interpreter v0.1.0 (/Users/lewis/development/personal/basic-interpreter)
|
||
Finished dev [unoptimized + debuginfo] target(s) in 0.51s
|
||
Running `target/debug/basic-interpreter`
|
||
Hello, world
|
||
```
|
||
|
||
Cool, that works!
|
||
|
||
### Escaped characters
|
||
|
||
Okay, but what about if I change my program to print quote characters (`"`)?. To do this, we need to escape the strings:
|
||
|
||
```basic
|
||
10 PRINT "Hello, \"World\""
|
||
```
|
||
|
||
Which we would expect to result in:
|
||
|
||
```bash
|
||
Hello, "World"
|
||
```
|
||
|
||
However because we're using `take_until`, our parser stops at the first escaped quote, resulting in:
|
||
|
||
```bash
|
||
Hello, \
|
||
```
|
||
|
||
To fix this, we need to use the `escaped_transform` parser:
|
||
|
||
```rust
|
||
fn read_string(i: &str) -> IResult<&str, &str> {
|
||
delimited(
|
||
tag("\""),
|
||
escaped_transform(
|
||
none_of("\\\""),
|
||
'\',
|
||
alt((value("\\", tag("\\")), value("\"", tag("\"")))),
|
||
),
|
||
tag("\""),
|
||
)(i)
|
||
}
|
||
```
|
||
|
||
What we're saying here is accept any character that doesn't match either `\` or `"` (`none_of("\\\"")`), where `\` is our escape character. Finally, we match escaped quote characters and escaped backslashes and un-escape them so that they print properly (otherwise our output will include escape characters when printed).
|
||
|
||
## Basic looping
|
||
|
||
Alright, next is everybody's favourite command: `GO TO`, the lets us jump to a a different part of the program.: Here's a short program using our two commands that will print "Hello World" infintely:
|
||
|
||
```basic
|
||
10 PRINT "Hello World"
|
||
20 GO TO 10
|
||
```
|
||
|
||
### Parsing
|
||
|
||
The first thing that leaps out to me from this command is that `GO TO` contains a space. That won't work with our current parser, which reads a string until it meets a space. Instead, we should try and be more specific:
|
||
|
||
```rust
|
||
#[derive(Debug, PartialEq, Eq)]
|
||
pub enum Command {
|
||
Print(String),
|
||
GoTo(usize),
|
||
None,
|
||
}
|
||
|
||
fn match_command(i: &str) -> IResult<&str, &str> {
|
||
alt((
|
||
tag("PRINT"),
|
||
tag("GO TO")
|
||
))(i)
|
||
}
|
||
|
||
fn parse_command(i: &str) -> IResult<&str, Command> {
|
||
let (i, command): (&str, &str) = match_command(i).unwrap_or((i, ""));
|
||
println!("{}", command);
|
||
let (i, _) = tag(" ")(i)?;
|
||
|
||
let (i, cmd) = match command {
|
||
"PRINT" => map(read_string, Command::Print)(i)?,
|
||
"GO TO" => map(ccu64, |line| Command::GoTo(line as usize))(i)?,
|
||
_ => (i, Command::None),
|
||
};
|
||
|
||
Ok((i, cmd))
|
||
}
|
||
```
|
||
|
||
### Building a program
|
||
|
||
For `GO TO` to function, we need a structure to actually store our program. We need to:
|
||
* Store a command as a line
|
||
* Easily move to the next line
|
||
* Search a program for a line by number
|
||
|
||
(A real compiler would do lots of clever things here that would enable it to drop unreachable code and optimise things, but that's not what we're here for).
|
||
|
||
We... might need a Linked List. They're pretty notoriously a headache in Rust due to ownership rules - but we can use `Box` to help mitigate this:
|
||
|
||
```rust
|
||
pub enum Node {
|
||
None,
|
||
Link { item: Line, next: Box<Node> }
|
||
}
|
||
```
|
||
|
||
We'll need to add the `Copy` trait to our `Command` enum for this to work:
|
||
|
||
```rust
|
||
#[derive(Debug, PartialEq, Eq, Copy)]
|
||
pub enum Command {
|
||
Print(String),
|
||
GoTo(usize),
|
||
None,
|
||
}
|
||
```
|
||
|
||
And then implement a (very basic) Linked List:
|
||
|
||
```rust
|
||
#[derive(Debug, PartialEq, Eq, Clone)]
|
||
pub enum Node {
|
||
None,
|
||
Link { item: Line, next: Box<Node> }
|
||
}
|
||
|
||
impl Node {
|
||
fn push(&mut self, val: Line) {
|
||
*self = match self {
|
||
Self::Link { item, next } => {
|
||
next.push(val);
|
||
Self::Link { item: item.clone(), next: next.clone() }
|
||
},
|
||
Self::None => Self::Link { item: val, next: Box::new(Self::None) }
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
We also want to be able to find a line, so we'll write a simple `find_line` function too:
|
||
|
||
|
||
```rust
|
||
fn find_line(&self, line: usize) -> Option<Node> {
|
||
if let Self::Link { item, next } = self {
|
||
if item.0 == line {
|
||
Some(self.clone())
|
||
} else {
|
||
next.find(line)
|
||
}
|
||
} else {
|
||
None
|
||
}
|
||
}
|
||
```
|
||
|
||
Finally, build a parser to read every line and store it in a list of nodes:
|
||
|
||
```rust
|
||
pub fn read_program(i: &str) -> IResult<&str, Node> {
|
||
let (i, lines) = separated_list0(tag("\n"), parse_line)(i)?;
|
||
let mut node = Node::None;
|
||
|
||
for line in lines.iter() {
|
||
node.push(line.clone());
|
||
}
|
||
|
||
Ok((i, node))
|
||
}
|
||
```
|
||
|
||
### Running the program
|
||
|
||
We have a list of instructions. Now we need to martial them and provide an interface to run them. To do this, I've created a `Program` struct that holds a reference to the complete program, and a cursor for the instruction currently being executed:
|
||
|
||
```rust
|
||
#[derive(Debug, PartialEq, Eq, Clone)]
|
||
pub struct Program {
|
||
nodes: Node,
|
||
current: Node,
|
||
}
|
||
```
|
||
|
||
I'm also going to implement `Iterator` for the struct, so that we can easily loop over all of the instructions:
|
||
|
||
```rust
|
||
impl Iterator for Program {
|
||
type Item = Node;
|
||
|
||
fn next(&mut self) -> Option<Self::Item> {
|
||
let curr = self.current.clone();
|
||
match &self.current {
|
||
Node::Link { item: _, next } => {
|
||
self.current = *next.clone();
|
||
Some(curr)
|
||
}
|
||
Node::None => None,
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Then we add an `execute` function, as well as a function to jump to a line:
|
||
|
||
```rust
|
||
impl Program {
|
||
pub fn new(node: Node) -> Self {
|
||
Program {
|
||
nodes: node.clone(),
|
||
current: node,
|
||
}
|
||
}
|
||
|
||
pub fn to_line(&mut self, line: usize) {
|
||
if let Some(node) = self.nodes.find_line(line) {
|
||
self.current = node;
|
||
} else {
|
||
panic!("Cannot jump to line {}, it does not exist", line);
|
||
}
|
||
}
|
||
|
||
pub fn execute(&mut self) {
|
||
let mut iter = self.clone();
|
||
|
||
while let Some(node) = iter.next() {
|
||
match node {
|
||
Node::Link { item, next } => {
|
||
match item.1 {
|
||
Command::Print(line) => println!("{}", line),
|
||
Command::GoTo(line) => iter.to_line(line),
|
||
_ => panic!("Unrecognised command")
|
||
}
|
||
},
|
||
_ => ()
|
||
};
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Now, we can run a new sample program, using the provided struct:
|
||
|
||
```rust
|
||
fn main() {
|
||
let file = fs::read_to_string("./inputs/simple_program.bas").unwrap();
|
||
let (_, mut program) = basic::read_program(&file).unwrap();
|
||
program.execute();
|
||
}
|
||
```
|
||
|
||
```
|
||
$ cargo run
|
||
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
|
||
Running `target/debug/basic-interpreter`
|
||
Hello World
|
||
Hello World
|
||
Hello World
|
||
Hello World
|
||
Hello World
|
||
(truncated ∞ lines)
|
||
```
|
||
|
||
I think that's where I'll leave this (surprisingly long) post. This was loads of fun, to be honest. I think for my next post I'll be adding some logic statements (`IF`) - aim to get this as close to a functioning piece of software as quickly as possible, and then work on some of the more fun commands.
|
||
|
||
I'm also going to refactor things a bit, because my `basic.rs` file got quite messy towards the end.
|