Start of Parsing
Table of Contents
1. Parser
1.1. How it works
Model
program -> [parser] -> Abstract Syntax tree (AST) -> [Interp] -> Result
transcript and Host Language langauge
EX: Python is the transcript langauge and the host language c
1+2 -> (plusC (NumC1) -> 3 (NumC2))
1.2. extended Backus–Naur form (EBNF)
Stuff in curly angle brackets are variables
<expr>::=<num> | {* <expr> <expr>} | {+ <expr> <expr>}
{* {+ 2 3} 7} -> the second type of expression
Actually, all of these are expressions defined earlier!
The expression can be split into a tree where the <num> is the leaf.
1.3. Now in Racket
NOTE ’()’ and ’{}’ are interchangable in racket
;;1. Data Definitions (define-type ExprC (U NumC PlusC MultC)) (struct NumC ([n : Real]) #:transparent) (struct PlusC ([left : ExprC] [right : ExprC]) #:transparent) (struct MultC ([left : ExprC] [right : ExprC]) #:transparent) ;;2. Purpose Statement and header ;;parse the given program into an AST (define (parse [prog : Sexp]) : ExprC (match prog ;;<num> [(? real? n) (NumC n)] ;; {+ <expr> <expr>} [(list '+ l r) (Plusc (parse l) (parse r))] ;; {* <expr> <expr>} [(list '* l r) (MultC (parse l) (parse r))])) ;;3. tests (check-equal? (parse '{* {+ 2 3} 7}) (MultC (PlusC (NumC 2) (NumC 3)) (NumC 7))) (check-equal? (parse '2) (NumC 2)) (check-equal? (parse '{+ 3 4}) (PlusC (NumC 3) (NumC 4)))
Why is the program type a String?
- Parsing Traditionally : lexer/tokenizer
- { * { + 2 3 } 7} <BRACE, ASTERISK, LBRACE, PLUS, Num(2), Num(3), …
- Tokenizer: chuncks string into individual tokens.
- Parser
- WE DONT WORRY ABOUT THIS LET RACKET HANDLE THE TOKENIZING
So we use Sexp or Sexpression instead!