Word2Vec

W_ord2vec_ solves the problem of sparsity since it uses a Neural Network to produce a dense vector for each word in a document.

Furthermore, the vector that represents the word will keep some algebraic properties, for example $V_{kin g} - V_{q u ee n} = V_{m e n} - V_{w o man}$ .

The catch is that we have to aggregate all those vectors in order to represent a document. A basic idea is just to use the sum, the average or other aggregation operations.

A better idea is to multiply each word vector for its TF-IDF score, and then sum them all, in order to have a more well weighted final document vector.

nlp

Quartz 4

Explorer

Word2Vec

Graph View

Backlinks