NLP-The language of Data Scientist’s.
Natural Language Processing(NLP)
Natural Language Processing, usually called as NLP, is a branch of artificial intelligence that deals with the interaction between computers and humans using the natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. Most NLP techniques rely on machine learning to derive meaning from human languages. NLP plays a critical role in supporting machine-human interactions.
In this article, I will let you know some of the NLP tasks which were performed and later we will deploy on to the web to make it a complete package.
The tasks are mentioned below.
- Analyzing the text and getting the tokens and lemma of the text.
- Also getting the NER(Named Entity Recognition) from the text entered.
- Sentimental Analysis.
- Text Summarization (Extract Summarization)
- Machine Translation.
I will throw some light on each and every task mentioned above as we proceed further.
1. Tokens and Lemma
A token is the smallest part of a corpus. And tokenization is the task of chopping it up into pieces, called tokens.
Input: NLP and Machine learning go hand in hand.
After Tokenization, the output is nothing but each of the word present in this sentence. NLP is one token Machine is another token and this list goes on like this.
Lemma is like getting to a root of that given word. Lemma uses wordnet corpus. It can be used when we want more human understandable words, as the output of lemmatization is a proper word. It will be more clear with an example.
Lets take three words “going”, “goes”, “gone”. The lemma is nothing but getting the root word which is “go”.
2. NER(Named Entity Recognition)
In any text document, there will be particular terms that represent specific entities that are more informative and have a unique context. These entities are known as named entities , which more specifically are real-world objects like people, places, organizations, and so on, which are often denoted by proper names.
Named entity recognition (NER) , also known as entity chunking/extraction, is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.
3. Sentimental Analysis.
Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis tools allow businesses to identify customer sentiment toward products, brands or services in online feedback
4. Text Summarization
Text summarization refers to the technique of shortening long pieces of text. The intention is to create a fluent summary while preserving key information content and overall meaning. Applying text summarization reduces reading time, accelerates the process of researching for information, and increases the amount of information that can fit in an area.
5. Machine Translation.
Machine translation (MT) is an automatic translation from one language to another. Machine translation refers to fully automated software that can translate source content into target languages. Humans may use MT to help them render text and speech into another language.
Over here I used the package TextBlob for the translation. The input can be any language text. The output will be the specified language which the end user wants.
I hope this Article given you good understanding NLP Tasks.
Follow Author on Linkedin: www.linkedin.com/in/karteek-menda