CIS 250 Binary search tree sort and count
The objective of this assignment is to use a binary search tree
to sort and obtain a count of all the words in a lengthy text file.
There are several text files available to test your code. All of the
text files are modified versions from
Project Gutenberg.
The files are available via the links below. They are also available on
the server we are using in the the directory: ~dklick/public_html/datasets/
Assignment notes
Here's a list of things you will need.
- A templated binary search tree class
that you can store a user-defined Word class in. You will have to define the Word
class with specific requirements that will be detailed below. We will work through the
creation of a templated BST class that you can use for this assignment
with minor modifications, which will also be detailed below.
- bintree.h is also available on the server we use in the ~dklick/examples/cis250/bintree directory.
- The Word class must meet the following requirements:
- a private string variable to hold the word
- a default constructor
- a constructor that accepts a string, converts that string to lowercase,
and uses that to set the private string variable [you can do this by
including the algorithm, functional and
cctype libraries, using std::ptr_fun, and executing the following statement:
transform(word.begin(), word.end(), word.begin(), ptr_fun(::tolower));]
- overloads of the six relational operators (this should be easy since
you just have to return the results of comparing the string objects
that the Word objects contain)
- an overload of the insertion operator that will insert the private word variable into the stream
- The Node class should contain the usual templated data type, and left and
right pointers. It should also contain:
- a private integer to hold a count (for duplicated nodes); initialize to 1 in constructor(s)
- a getCount() method that returns the count
- a setCount(int) method that sets the count and returns the result
- an overload of the insertion operator that displays a node using the data type's
insertion operator overload, and then displays the count in parentheses
following that.
- The BST must accept duplicate node data values by doing the following:
- when inserting, if a duplicate value is found in the tree, increment
the count and return true instead of returning false
- when deleting, whenever the value is found in the tree, see if the count
for that node is greater than 1; if it is, just decrement the node's count;
otherwise, execute the normal code to delete the node
- Here's what the mainline will have to do:
- create a binary search tree with nodes that hold Word objects as data
- open the data file
- for each line in the file
- read in the line
- tokenize the line read in using " -\"\'?!,;:.()_`*" for the delimiters
- for each token in the line, make it into a Word object and insert it into the BST
- close the file
- display all the words in the BST in alphabetical order along with their count
Sample output
Note that some of the words include punctuation because the characters
in the text file did not exactly match the delimiters we provided. In
addition, you may see slightly different output where special characters
are displayed in the list below.
Processing file: alice.txt
1 (1)
100 (1)
111 (1)
116 (1)
126 (1)
... many lines omitted ...
yours (3)
yourself (10)
youth (6)
zealand (1)
zigzag (1)
Processing file: cities.txt
1757 (1)
1767 (2)
1792 (1)
21 (1)
a (2953)
... many lines omitted ...
youth (9)
youthful (3)
youthfulness (1)
youths (1)
zealous (2)
Processing file: gulliver.txt
&c (1)
10 (1)
10th (1)
11th (1)
13th (1)
... many lines omitted ...
"you (3)
"' (1)
"'that (3)
"'whereas (1)
" (313)
Processing file: mobydick.txt
$20 (1)
$7 (1)
& (2)
000 (20)
1 (2)
... many lines omitted ...
zone (5)
zoned (2)
zones (3)
zoology (2)
zoroaster (1)
Processing file: warAndPeace.txt
000 (8)
07 (2)
1 (21)
10 (6)
100 (2)
... many lines omitted ...
zubovski (2)
zum (1)
zweck (1)
à (4)
être (1)