Text as Data (H) - Lecture 01 Quiz
1. Tokenisation is simply the process of splitting text into words on space characters
False
True
2. A one-Âhot encoding records the frequency of each word in a piece of text
False
True
3. We must store the offsets of words in the vectors using a dictionary in order to implement one-Âhot encoding
True
False
4. A cosine similarity of 1.0 means two texts:
Contain both identical and orthogonal words
Contain completely different words
Contain identical words
5. Normalising the dot product in the cosine similarity function allows document of different lengths to be more fairly compared?
False
True
6. Why is stemming useful?
It allows matching of words with morphological variations
It allows matching of misspelled words
It allows matching of words that sound the same
Submit Quiz