Start of Parsing

Table of Contents

1. Parser

1.1. How it works

Model

program -> [parser] -> Abstract Syntax tree (AST) -> [Interp] -> Result

transcript and Host Language langauge

EX: Python is the transcript langauge and the host language c

1+2 -> (plusC (NumC1) -> 3 (NumC2))

1.2. extended Backus–Naur form (EBNF)

Stuff in curly angle brackets are variables

<expr>::=<num>
       | {* <expr> <expr>}
       | {+ <expr> <expr>}

{* {+ 2 3} 7} -> the second type of expression

Actually, all of these are expressions defined earlier!

The expression can be split into a tree where the <num> is the leaf.

1.3. Now in Racket

NOTE ’()’ and ’{}’ are interchangable in racket

;;1. Data Definitions
(define-type ExprC (U NumC PlusC MultC))

(struct NumC ([n : Real]) #:transparent)
(struct PlusC ([left : ExprC] [right : ExprC]) #:transparent)
(struct MultC ([left : ExprC] [right : ExprC]) #:transparent)

;;2. Purpose Statement and header
;;parse the given program into an AST
(define (parse [prog : Sexp]) : ExprC
    (match prog
     ;;<num>
     [(? real? n) (NumC n)]
     ;; {+ <expr> <expr>}
     [(list '+ l r) (Plusc (parse l) (parse r))]
     ;; {* <expr> <expr>}
     [(list '* l r) (MultC (parse l) (parse r))]))

;;3. tests

(check-equal? (parse '{* {+ 2 3} 7})
              (MultC (PlusC (NumC 2) (NumC 3))
                     (NumC 7)))

(check-equal? (parse '2) (NumC 2))
(check-equal? (parse '{+ 3 4})
              (PlusC (NumC 3) (NumC 4)))

Why is the program type a String?

  • Parsing Traditionally : lexer/tokenizer
    • { * { + 2 3 } 7} <BRACE, ASTERISK, LBRACE, PLUS, Num(2), Num(3), …
    • Tokenizer: chuncks string into individual tokens.
  • Parser
    • WE DONT WORRY ABOUT THIS LET RACKET HANDLE THE TOKENIZING

So we use Sexp or Sexpression instead!

Date: 2024-10-04 Fri 00:00

Author: Anthony Rossi

Created: 2024-10-04 Fri 17:46