Text as Data (H) - Lecture 02/03 Quiz

1.

TF.IDF

We sometimes use log() for term frequency. Does this:

2.

TF.IDF

The more a term is observed in the corpus, the

3.

TF.IDF

Some search engines will remove terms that occur in more than 50% of documents. Is this because:

4. In hierarchical clustering, all documents belong to the root node.

5. You are clustering a set of documents with Kmeans. Will the clustering be deterministic?