Text as Data (H) - Lecture 01 Quiz
1. Tokenisation is simply the process of splitting text into words on space characters
True
False
2. A one-Âhot encoding records the frequency of each word in a piece of text
True
False
3. We must store the offsets of words in the vectors using a dictionary in order to implement one-Âhot encoding
True
False
4. A cosine similarity of 1.0 means two texts:
Contain both identical and orthogonal words
Contain identical words
Contain completely different words
5. Normalising the dot product in the cosine similarity function allows document of different lengths to be more fairly compared?
True
False
6. Why is stemming useful?
It allows matching of misspelled words
It allows matching of words with morphological variations
It allows matching of words that sound the same
Submit Quiz