Skip to main content

NLP (MOC)

Notes

NLP or "natural language processing" is the method by which we "teach" a computer to read text. Since computers don't understand words, we have to convert text into numbers. The challenge is to convert it in such a way that "somehow" not only explains the meaning of the word, but perhaps only maintains it's role within a sentence, and the connection it has to the general meaning of it.

The best way to do that is through TF-IDF, a method of converting text into a vector. The other is a Naive Bayes classifier .

Types of Analysis

With NLP, you can:

  1. Text Classification - For example to identify and categorize text as either "spam" or not spam
  2. Text Generation - The basis for all Chat AI models, that generate text based on a prompt
  3. Sources/References/Sentiment Analysis - To analyze whether a text (perhaps a review) is either positive, negative, or neutral
  4. Topic Modeling - To group text by topic, for example news articles into political, economics, etc.

Techniques

Most common features for NLP:

  1. Named Entity Recognition - To detect popular names such as companies within the text
  2. Regex - To search for matches within the text based on a special pattern. Also see pattern matching
  3. Tokenization - To break town a sentence into base components (which can then be converted into a vector)

Courses

natural language processing course NLP with python

Websites

Other MOC

Overview

Join the Journey

Philosopher's Code offers practical philosophy for everyday life

Subscribe to start your journey with the Five Quests for a Philosophical Life guide