Infix tokenizer

Objectives

  • Use an associative array (map)
  • Use queues
  • Create a finite state machine that can parse arithmetic expressions into tokens
  • Implement a complex algorithm using C++

Overview

The objectives for this assignment are to use associative arrays (maps in this case), a queue, and an extended finite state machine to parse a string containing an arithmetic expression into tokens and place those tokens into a queue for further processing. The next assignment translates the expression from the current infix notation into postfix notation, and then evaluates the postfix expression. Tests should be done to make sure your program works properly before submitting it for grading. Notice that output from a sample run has been provided.

This assignment looks difficult at first, but the steps that are laid out are detailed. You are likely to have a few questions, but the assignment should primarily be a translation exercise from pseudocode into C++. I'll be available to answer questions, provide guidance, and check out your code as it develops to help debug it and get it running.

There are three user-defined classes and the main program that make up this assignment.

  • FSM: contains data and methods needed for the finite state machine
  • FSMRule: used internally within the finite state machine
  • Token: represents each token in the input expression
  • infixTokenizer.cpp: the main program which runs everything (and can contain the other classes)

Finite state machines

Finite state machines are a concept that is often used in computer programming. You may even have done something equivalent or similar without even realizing it. The basic idea is that there are a finite number of states. You begin in one particular state and start processing input. Each state has rules for what it will do with different types of input. As input is read, the appropriate rule is looked up based on the current state and the input that was read. The rule describes what actions should be taken and the next state to move into. This assignment uses an extended finite state machine (a finite state machine with memory) to parse tokens in an input stream and output them to a queue. The basic diagram for this state machine, without the associated actions, is shown below.

finite state machine diagram
Finite state machine diagram

Description of states in state machine

  • waiting for start of operand or open parenthesis
  • operand with sign only so far
  • operand with digits, but no decimal point
  • operand with decimal point, but no digits yet
  • operand with digits and decimal point
  • waiting for operator or close parenthesis

Template

A template is available so you can start with the Token, FSM, and FSMRule classes. It is located at infixStart.txt.

infixTokenizer.cpp

This is what you have to write to do all the work. You can start by grabbing the template for the assignment and renaming it infixTokenizer.cpp. This assignment uses maps, a queue, and implements a finite state machine. There are three global variables and seven functions you have to implement.

Note: In the pseudocode below, you may see something like: if rule.actions includes FSM::ERROR. That means that you are actually checking a bit in rule.actions to see if it is set. Fortunately, there are constants already set up in the FSM class to make this easy. Here's the code that would check rule.actions to see if it includes FSM::ERROR: if ((rule.actions & FSM::ERROR) != 0) { ... }

Note: Adding a new Token to the infixQueue involves creating the Token and then adding it. Here's how you would add a new Token using: Token::OPERAND, operand, 0, true Token tkn(Token::OPERAND, operand, 0, true); infixQueue.push_back(tkn);

infixTokenizer global variables

  • a reference variable to hold a new FSM
  • a map named precedences which associates a string with an int
  • a map named leftAssociative which associates a string with a bool

int main()

This is the method that does most of the work. It has two main loops, one nested inside the other. The outer loop keeps getting one line of input at a time from the user. The inner loop processes one character at a time from the input data.

  • create a bool named DEBUG initialized to true
  • create a deque named infixQueue which contains Token objects
  • call initPrecedences
  • call initAssociativities
  • call initFSM
  • create a boolean named done and set it to false
  • cout << "Enter an expression to evaluate (or QUIT to exit)\n"; if (DEBUG) cout << "Program is in DEBUG mode\n";
  • while not done
    • create a boolean named error and set it to false
    • create a variable to hold the state and set it to 1 (the initial state)
    • create a variable to keep track of the current position and set it to 0
    • create a variable to count the number of parentheses and set it to 0
    • create a string variable to hold the operand and set it to be a zero-length string
    • char expr[256]; cout << "\nEnter expression: "; cin.getline(expr, 256); if (expr[0] == 'Q' || expr[0] == 'q') done = true;
    • clear the infixQueue
    • while not done and error is false and position is less than strlen(expr)
      • create a char named ch to hold the current character and set it; the current character is at position in expr
      • create a variable to hold an FSM::FSMRule, and set it to the rule obtained using the current state and the current character of input; remember that the finite state machine class has a method for this
      • if rule.actions includes FSM::ERROR, then do the following: error = true; cout << "Error: " << expr << '\n'; cout << " "; for (int j=0; j<pos; j++) cout << ' '; cout << '^';
      • otherwise do the following
        • if rules.actions includes FSM::APPEND_OPERAND
          • if the length of operand is greater than 0, add a new Token to the end of the infixQueue using: Token::OPERAND, operand, 0, true
          • set operand to a zero length string
        • if rules.actions includes FSM::CLEAR_OPERAND then set operand to a zero length string
        • if rules.actions includes FSM::ACCUM_OPERAND then append the current input character to the end of operand
        • if rules.actions includes FSM::APPEND_OPERATOR
          • turn the current character into a string: string strCh; strCh += ch;
          • get the left associativity of the current character
          • get the precedence of the current character
          • add a new Token to the end of the infixQueue using: Token::getType(ch), the current character turned into a string, the precedence, and the left associativity
        • if rules.actions includes FSM::NONE then do nothing
        • set the state to whatever rule.nextState is
        • if the current character is an open parenthesis, add one to the parenthesis counter
        • if the current character is a close parenthesis, subtract one from the parenthesis counter
        • if error is false and the parenthesis counter is less than 0 then error = true; cout << "Error: " << expr << '\n'; cout << " "; for (int j=0; j<pos; j++) cout << ' '; cout << '^';
      • if error is false, increment the current position in the input string
    • if error is false and the parenthesis counter is not 0 then error = true; cout << "Error: " << expr << '\n'; cout << " "; for (int j=0; j<pos; j++) cout << ' '; cout << '^';
    • if done or error is true, then continue (do next iteration of loop)
    • if the operand length is greater than 0, then add the operand as a new Token to the end of the infixQueue using: Token::OPERAND, operand, 0,true
    • if DEBUG is true cout << "Infix queue:\n"; for (int i=0; i<infixQueue.size(); i++) cout << (i+1) << ": " << infixQueue[i].toString() << '\n';

bool isLeftAssociative(string oper)

This is a pretty simple function whose job is to return true if an operator is left associative, and false if it isn't. The associativities are already stored in the leftAssociative map, which makes this routine rather short.

  • use the operator passed in to get a count of the number of matches in the leftAssociative map
  • if the number found is zero, then return true
  • if the number found was not zero, then return the value found in the leftAssociative map for the key: oper

int getPrecedence(string oper)

This is a pretty simple function whose job is to return the integer precedence value of an operator. The precedences are already stored in the precedences map, which makes this routine rather short.

  • use the operator passed in to get a count of the number of matches in the precedences map
  • if the number found is zero, then return 0
  • if the number found was not zero, then return the value found in the precedences map for the key: oper

void initPrecedences()

This is a pretty straight-forward function. It just has to add the following key/value pairs to the precedences map.

  • "-", 1
  • "+", 1
  • "*", 2
  • "/", 2
  • "%", 2
  • "^", 3

void initAssociativities()

This is a pretty straight-forward function. It just has to add the following key value pairs to the leftAssociative map.

  • "-", true
  • "+", true
  • "*", true
  • "/", true
  • "%", true
  • "^", false

void errorExit(string msg)

This method is simple. Print "Error: " followed by the msg passed in, followed by a newline. Then end the program.

void initFSM()

The initFSM function has to add rules to the state machine. Keep in mind that you already have a function named put that is part of the FSM object which you can use for this task. You also have an FSM object referred to by a variable named fsm. The rules in the list below are in the following format: state, character type, actions, next state.

  • 1, FSM::SIGN, FSM::ACCUM_OPERAND, 2
  • 1, FSM::DIGIT, FSM::ACCUM_OPERAND, 3
  • 1, FSM::PERIOD, FSM::ACCUM_OPERAND, 4
  • 1, FSM::OPEN, FSM::APPEND_OPERATOR, 1
  • 1, FSM::BLANK, FSM::NONE, 1
  • 2, FSM::DIGIT, FSM::ACCUM_OPERAND, 3
  • 2, FSM::PERIOD, FSM::ACCUM_OPERAND, 4
  • 3, FSM::SIGN, FSM::APPEND_OPERAND | FSM::APPEND_OPERATOR, 1
  • 3, FSM::DIGIT, FSM::ACCUM_OPERAND, 3
  • 3, FSM::PERIOD, FSM::ACCUM_OPERAND, 5
  • 3, FSM::CLOSE, FSM::APPEND_OPERAND | FSM::APPEND_OPERATOR, 6
  • 3, FSM::OPERATOR, FSM::APPEND_OPERAND | FSM::APPEND_OPERATOR, 1
  • 3, FSM::BLANK, FSM::APPEND_OPERAND, 6
  • 4, FSM::DIGIT, FSM::ACCUM_OPERAND, 5
  • 5, FSM::SIGN, FSM::APPEND_OPERAND | FSM::APPEND_OPERATOR, 1
  • 5, FSM::DIGIT, FSM::ACCUM_OPERAND, 5
  • 5, FSM::CLOSE, FSM::APPEND_OPERAND | FSM::APPEND_OPERATOR, 6
  • 5, FSM::OPERATOR, FSM::APPEND_OPERAND | FSM::APPEND_OPERATOR, 1
  • 5, FSM::BLANK, FSM::APPEND_OPERAND, 6
  • 6, FSM::SIGN, FSM::APPEND_OPERATOR, 1
  • 6, FSM::CLOSE, FSM::APPEND_OPERATOR, 6
  • 6, FSM::OPERATOR, FSM::APPEND_OPERATOR, 1
  • 6, FSM::BLANK, FSM::NONE, 6

Sample run

Rubric

  • 5 pts: variables are declared properly (and initialized as needed)
  • 3 pts: The initPrecedences() function works properly
  • 3 pts: The initAssociativities() function works properly
  • 3 pts: The initFSM() function works properly
  • 3 pts: The isLeftAssociative(string) function works properly
  • 3 pts: The errorExit(string) function works properly
  • 3 pts: The getPrecedence(string) function works properly
  • 27 pts: The main() function properly tokenizes the input and places it in the infix queue
  • Up to 5 points can be deducted for missing documentation
  • Up to 5 points can be deducted for not following style conventions specified for this course