NLP (MOC)
Notes
NLP or "natural language processing" is the method by which we "teach" a computer to read text. Since computers don't understand words, we have to convert text into numbers. The challenge is to convert it in such a way that "somehow" not only explains the meaning of the word, but perhaps only maintains it's role within a sentence, and the connection it has to the general meaning of it.
The best way to do that is through TF-IDF, a method of converting text into a vector. The other is a Naive Bayes classifier .
Types of Analysis
With NLP, you can:
- Text Classification - For example to identify and categorize text as either "spam" or not spam
- Text Generation - The basis for all Chat AI models, that generate text based on a prompt
- Sources/References/Sentiment Analysis - To analyze whether a text (perhaps a review) is either positive, negative, or neutral
- Topic Modeling - To group text by topic, for example news articles into political, economics, etc.
Techniques
Most common features for NLP:
- Named Entity Recognition - To detect popular names such as companies within the text
- Regex - To search for matches within the text based on a special pattern. Also see pattern matching
- Tokenization - To break town a sentence into base components (which can then be converted into a vector)
Courses
natural language processing course NLP with python