CIS 250 Binary search tree sort and count

The objective of this assignment is to use a binary search tree to sort and obtain a count of all the words in a lengthy text file. There are several text files available to test your code. All of the text files are modified versions from Project Gutenberg. The files are available via the links below. They are also available on the server we are using in the the directory: ~dklick/public_html/datasets/

Assignment notes

Here's a list of things you will need.

  1. A templated binary search tree class that you can store a user-defined Word class in. You will have to define the Word class with specific requirements that will be detailed below. We will work through the creation of a templated BST class that you can use for this assignment with minor modifications, which will also be detailed below.
  2. bintree.h is also available on the server we use in the ~dklick/examples/cis250/bintree directory.
  3. The Word class must meet the following requirements:
  4. The Node class should contain the usual templated data type, and left and right pointers. It should also contain:
    1. a private integer to hold a count (for duplicated nodes); initialize to 1 in constructor(s)
    2. a getCount() method that returns the count
    3. a setCount(int) method that sets the count and returns the result
    4. an overload of the insertion operator that displays a node using the data type's insertion operator overload, and then displays the count in parentheses following that.
  5. The BST must accept duplicate node data values by doing the following:
    1. when inserting, if a duplicate value is found in the tree, increment the count and return true instead of returning false
    2. when deleting, whenever the value is found in the tree, see if the count for that node is greater than 1; if it is, just decrement the node's count; otherwise, execute the normal code to delete the node
  6. Here's what the mainline will have to do:

Sample output

Note that some of the words include punctuation because the characters in the text file did not exactly match the delimiters we provided. In addition, you may see slightly different output where special characters are displayed in the list below.

Processing file: alice.txt
1 (1)
100 (1)
111 (1)
116 (1)
126 (1)
... many lines omitted ...
yours (3)
yourself (10)
youth (6)
zealand (1)
zigzag (1)

Processing file: cities.txt
1757 (1)
1767 (2)
1792 (1)
21 (1)
a (2953)
... many lines omitted ...
youth (9)
youthful (3)
youthfulness (1)
youths (1)
zealous (2)

Processing file: gulliver.txt
&c (1)
10 (1)
10th (1)
11th (1)
13th (1)
... many lines omitted ...
"you (3)
"' (1)
"'that (3)
"'whereas (1)
" (313)

Processing file: mobydick.txt
$20 (1)
$7 (1)
& (2)
000 (20)
1 (2)
... many lines omitted ...
zone (5)
zoned (2)
zones (3)
zoology (2)
zoroaster (1)

Processing file: warAndPeace.txt
000 (8)
07 (2)
1 (21)
10 (6)
100 (2)
... many lines omitted ...
zubovski (2)
zum (1)
zweck (1)
à (4)
être (1)