Latent Semantic Analysis
One component of the system described therein that I'm currently grappling with is the difference between Latent and Explicit Semantic Analysis.
I've been writing up a document to encapsulate my understanding but it's somewhat, "cobbled together", from sources which I don't 100% understand, so I'd like to know if what I've come up with is accurate, here it is:
When implementing a process like singular value decomposition (SVD) or Markov chain Monte Carlo machines, a corpus of documents can be partitioned on the basis of inherent characteristics and assigned to categories by applying different weights to the features that constitute each singular data index. In this highdimensional space it is often difficult to determine the combination of factors leading to an outcome or result, the variables of interest are “hidden” or latent. By defining a set of humanly intelligible categories, i.e. Wikipedia article pages as a basis for comparison [Gabrilovich et al. 2007] have devised a system whereby the criteria used to distinguish a datum are readily comprehensible, from the text we note that “semantic analysis is explicit in the sense that we manipulate manifest concepts grounded in human cognition, rather than ‘latent concepts’ used by Latent Semantic Analysis”. With that we have now established Explicit Semantic Analysis in opposition to Latent Semantic Analysis.