Binary search tree sort and count

Objectives

  • Modify a templated binary search tree class to handle duplicates.
  • Process a text file to isolate words and store them in a binary search tree.
  • Modify an overloaded insertion operator
  • Test and debug the binary search tree's delete function.

Resources

The objective of this assignment is to use a binary search tree to sort and obtain a count of all the words in a lengthy text file. There are several text files available to test your code. All of the text files are modified versions from Project Gutenberg. The files are available via the links below. They are also available on the server we are using in the the directory: ~dklick/public_html/datasets/

Assignment notes

  1. The Bintree.h file is also available on the server we use: ~dklick/examples/cis250/bintree/Bintree.h
  2. The Bintree.h file includes the templated Node class
  3. The WordCount.cpp file is also available on the server we use: ~dklick/examples/cis250/bintree/WordCount.cpp

Node class

The Node class contains the data that is stored in the binary search tree. Any data that is stored in the tree should support the relational operators. The Node class will need some modificatuions so that duplicate data items can be stored in the tree. Those modifications are:

  1. (2 pts) Add a private integer variable to this class to keep a count.
  2. (3 pts) Make sure all the constructors initialize the count.
  3. (2 pts) Create a getCount() accessor function.
  4. (2 pts) Create an increment() method to increase the count by 1 and return its value.
  5. (2 pts) Create a decrement() method to decrease the count by 1 and return its value.
  6. (3 pts) Modify the operator<< overload to display a count in parentheses after the data.

Bintree class

The Bintree class needs modifications to handle duplicate data. Those modifications are:

  1. (6 pts) In the remove(const T&) function, the count of the node containing the data to be removed must be checked. This is done after the matching node is found. If the count is greater than 1, then decrement the count and return true. If the count is 1, then let the code continue on with a normal node deletion.
  2. (6 pts) In the insert(const T&) function, the behavior if a matching node is found is to return false. That must be changed to increment that node's count, delete the new node created upon entry to the insert function, and then return true.

WordCount driver program

The WordCount program needs some additional code to become functional. You need to do the following:

  1. (2 pts) At the start of the program, create a variable named tree that can hold string objects.
  2. In the first loop that reads the text file:
    1. (2 pts) Convert the line read in using the convert function provided. This will convert the line to lowercase and change some difficult to handle characters into spaces.
    2. (2 pts) Use the tokenize function to break the converted line into tokens that will be stored in a vector you send to the function.
    3. (3 pts) Add each of the strings (tokens, words) in the vector to the binary search tree.
  3. (2 pts) After all the words have been inserted, write code that will display the tree in alphabetical order.
  4. In the second loop that reads the text file:
    1. (2 pts) Convert the line read in using the convert function provided. This will convert the line to lowercase and change some difficult to handle characters into spaces.
    2. (2 pts Use the tokenize function to break the converted line into tokens that will be stored in a vector you send to the function.
    3. (3 pts) Remove each of the strings (tokens, words) in the vector from the binary search tree.
    4. (4 pts) The remove function returns false when a removal fails. If the remove function returns false, display an error message.
  5. (2 pts) After all the words have been deleted, write code that will display the tree in alphabetical order. Nothing should display since the tree should be empty.

Sample output

Processing file: cities.txt Tree after insertions: 1757 (1) 1767 (2) 1792 (1) 21 (1) a (2953) aback (1) abandon (1) abandoned (10) abandoning (1) ... many lines omitted ... yours (16) yourself (42) yourselves (3) youth (9) youthful (3) youthfulness (1) youths (1) zealous (2) Tree after deletions: