DIGIT Core
PlatformDomainsAcademyDesign SystemFeedback
2.8
2.8
  • ☑️Introducing DIGIT Platform
    • DIGIT - Value Proposition
  • Platform
    • 🔎Overview
      • Principles
      • Architecture
        • Service Architecture
        • Infrastructure Architecture
        • Deployment Architecture
      • Technology
        • API Gateway
        • Open Source Tools
      • Checklists
        • API Checklist
        • Security Checklist
          • Security Guidelines Handbook
          • Security Flow - Exemplar
        • Performance Checklist
        • Deployment Checklist
      • UI Frameworks
        • React UI Framework
    • 🔧Core Services
      • Workflow Service
        • Setting Up Workflows
        • Configuring Workflows For An Entity
        • Workflow Auto Escalation
        • Migration To Workflow 2.0
      • Location Services
      • User Services
      • Access Control Services
      • PDF Generation Service
      • MDMS (Master Data Management Service)
        • Setting up Master Data
          • MDMS Overview
          • MDMS Rewritten
          • Configuring Tenants
          • Configuring Master Data
          • Adding New Master
          • State Level Vs City Level Master
      • Payment Gateway Service
      • User Session Management
      • Indexer Service
        • Indexer Configuration
      • URL Shortening Service
      • XState Core Chatbot
        • Xstate-Chatbot Message Localisation
        • XState-Chatbot Integration Document
      • NLP Engine Service
        • NLP Chatbot
      • SMS Template Approval Process
      • Telemetry Service
      • Document Uploader Service
      • Notification Enhancement Based On Different Channel
      • Report Service
        • Configuring New Reports
          • Impact Of Heavy Reports On Platform
          • Types Of Reports Used In Report Service
      • SMS Notification Service
        • Setting Up SMS Gateway
          • Using The Generic GET & POST SMS Gateway Interface
      • Survey Service
      • Persister Service
        • Persister Configuration
      • Encryption Service
        • Encryption Client Library
        • User Data Security Architecture
        • Guidelines for supporting User Privacy in a module
      • FileStore Service
      • ID Generation Service
      • Localization Service
        • Configuring Localization
          • Setup Base Product Localization
          • Configure SMS and Email
      • Email Notification Service
      • Searcher Service
      • Zuul Service
      • User OTP Service
      • OTP Service
      • Chatbot Service
      • National Dashboard Ingest
        • National Dashboard API Performance Testing Specs and Benchmark
        • National Dashboard: Steps for Index Creation
        • National Dashboard Adaptor Service
          • Deployment of Airflow DAG
          • Trigger Airflow DAG
          • Configure Airflow
          • Insert & Delete Data - Steps
          • Important Links & Credentials
          • Code Structure
          • KT Sessions
          • Pre-requisites For Enabling Adaptor
        • Revenue Maximisation
      • Audit Service
        • Signed Audit Performance Testing Results
      • Service Request
      • Self Contained Service Architecture (HLD)
      • Accelerators
        • Inbox Service
    • ✏️API Specifications
      • User
      • Access Control
      • Employee
      • Location
      • Localisation
      • Encryption
      • Indexer
      • File Store
      • Collection
      • DSS Ingest
      • HRMS
      • National Dashboard Ingest
      • WhatsApp Chatbot
      • Master Data Management
      • ID Generation
      • URL Shortner
      • Workflow Service
      • Workflow v2
      • Document Uploader Service
      • OTP Service
      • Reporting Service
      • PDF Generation Service
      • Payment Gateway Service
    • 🔐Data Protection & Privacy
      • Data Protection & Privacy Definitions
      • Legal Obligations For Privacy - eGov
      • Data Protection & Privacy - Global Best Practices
      • Guidelines
        • Platform Owner Guidelines
        • Implementing Agencies Guidelines
        • Admin Guidelines
        • Program Owner Guidelines
        • Data Security and Data Privacy
      • Data Privacy Policy Templates
        • eGov Data Privacy Policy
        • Implementing Agency Privacy Policy
        • Admin & Program Owner Privacy Policy
        • Supporting Agency Privacy Policy
      • Global Standards For All Roles
    • ▶️Get Started
      • Install DIGIT
      • Access DIGIT
      • Sandbox
      • Training and Certification
        • Training Resources
    • ⚒️Integrations
      • Payment
      • Notification
      • Transaction
      • Verification
      • View
      • Calculation
    • 🛣️Roadmap
    • 🎬Open Events
    • 👩‍💻Source Code
    • 👁️Project Plan
    • 📋Discussion Board
    • 🤝Contribute
  • Guides
    • 📓Installation Guide
      • DIGIT Deployment
      • Quick Setup
        • DIGIT Installation on Azure
        • DIGIT Installation on AWS
      • Production Setup
        • AWS
          • 1. Pre-requisites
          • 2. Understanding EKS
          • 3. Setup AWS Account
          • 4. Provisioning Infra Using Terraform
          • 5. Prepare Deployment Config
          • 6. Deploy DIGIT
          • 7. Bootstrap DIGIT
          • 8. Productionize DIGIT
          • FAQ
        • Azure
          • 1. Azure Pre-requisites
          • 2. Understanding AKS
          • 3. Infra-as-code (Terraform)
        • SDC
          • 1. SDC Pre-requisites
          • 2. Infra-as-code (Kubespray)
          • CI/CD Setup On SDC
        • CI/CD Set Up
          • CI/CD Build Job Pipeline Setup
        • Prepare Helm Release Chart
        • Deployment - Key Concepts
          • Security Practices
          • Readiness & Liveness
          • Resource Requests & Limits
          • Deploying DIGIT Services
          • Deployment Architecture
          • Routing Traffic
          • Backbone Deployment
    • 💽Data Setup Guide
      • User Module
      • Localisation Module
      • Location Module
    • 🚥Design Guide
      • Model Requirements
      • Design Services
      • Design User Interface
      • Checklists
    • ⚒️Developer Guide
      • Pre-requisites Training Resources
      • Backend Developer Guide
        • Section 0: Prep
          • Development Pre-requisites
          • Design Inputs
            • High Level Design
            • Low Level Design
          • Development Environment Setup
        • Section 1: Create Project
          • Generate Project Using API Specs
          • Create Database
          • Configure Application Properties
          • Import Core Models
          • Implement Repository Layer
          • Create Validation & Enrichment Layers
          • Implement Service Layer
          • Build The Web Layer
        • Section 2: Integrate Persister & Kafka
          • Add Kafka Configuration
          • Implement Kafka Producer & Consumer
          • Add Persister Configuration
          • Enable Signed Audit
          • Run Application
        • Section 3: Integrate Microservices
          • Integrate IDGen Service
          • Integrate User Service
          • Add MDMS Configuration
          • Integrate MDMS Service
          • Add Workflow Configuration
          • Integrate Workflow Service
          • Integrate URL Shortener Service
        • Section 4: Integrate Billing & Payment
          • Custom Calculator Service
          • Integrate Calculator Service
          • Payment Back Update
        • Section 5: Other Advanced Integrations
          • Add Indexer Configuration
          • Certificate Generation
        • Section 6: Run Final Application
        • Section 7: Build & Deploy Instructions
        • FAQs
      • Flutter UI Developer Guide
        • Introduction to Flutter
          • Flutter - Key Features
          • Flutter Architecture & Approach
          • Flutter Pre-Requisites
        • Setup Development Environment
          • Flutter Installation & Setup Guide
          • Setup Device Emulators/Simulators
          • Run Application
        • Build User Interfaces
          • Create Form Screen
        • Build Deploy & Publish
          • Build & Deploy Flutter Web Application
          • Generate Android APKs & App Bundles
          • Publishing App Bundle To Play Store
        • State Management With Provider & Bloc
          • Provider State Management
          • BloC State Management
        • Best Practices & Tips
        • Troubleshooting
      • UI Developer Guide
        • DIGIT-UI
        • Android Web View & How To Generate APK
        • DIGIT UI Development Pre-requisites
        • UI Configuration (DevOps)
        • Local Development Setup
        • Run Application
        • Create New Screen In DIGIT-UI
          • Create Screen (FormComposer)
          • Inbox/Search Screen
          • Workflow Component
        • Customisation
          • Integrate External Web Application/UI With DIGIT UI
          • Utility - Pre-Process MDMS Configuration
          • CSS Customisation
        • Citizen Module Setup
          • Sample screenshots
          • Project Structure
          • Install Dependency
          • Import Required Components
          • Write Citizen Module Code
          • Citizen Landing Screen
        • Employee Module Setup
          • Write Employee Module Code
        • Build & Deploy
        • Setup Monitoring Tools
        • FAQs
          • Troubleshoot Using Browser Network Tab
          • Debug Android App Using Chrome Browser
    • 🔄Operations Guide
      • DIGIT - Infra Overview
      • Setup Central Instance Infra
      • Central Monitoring Dashboard Setup
      • Kubernetes
        • RBAC Management
        • DB Dump - Playground
      • Setup Jenkins - Docker way
      • GitOps
        • Git Client installation
        • GitHub organization creation
        • Adding new SSH key to it
        • GitHub repo creation
        • GitHub Team creation
        • Enabling Branch protection:
        • CODEOWNER Reviewers
        • Adding Users to the Git
        • Setting up an OAuth with GitHub
        • Fork (Fork the mdms,config repo with a tenant-specific branch)
      • Working with Kubernetes
        • Installation of Kubectl
      • Containerizing application using Docker
        • Creation of Dockerhub account
      • Infra provisioning using Terraform
        • Installation of Terraform
      • Customization of existing tf templates
      • Cert-Manager
        • Obtaining SSL certificates with the help of cluster-issuer
      • Moving Docker Images
      • Pre and post deployment checklist
      • Multi-tenancy Setup
      • Availability
        • Infrastructure
        • Backbone services
          • Database
          • Kafka
          • Kafka Connect
          • Elastic search
            • ElasticSearch Direct Upgrade
            • Elastic Search Rolling Upgrade
        • Core services
        • DIGIT apps
        • DSS dashboard
      • Observability
        • ES-Curator to clear old logs/indices
        • Monitoring
        • Tracing
        • Jaeger Tracing Setup
        • Logging
        • eGov Monitoring & Alerting Setup
        • eGov Logging Setup
      • Performance
        • What to monitor?
          • Infrastructure
          • Backbone services
          • Core services
        • Identifying bottlenecks
        • Solutions
      • Handling errors
      • Security
      • Reliability and disaster recovery
      • Privacy
      • Skillsets/hiring
      • Incident management processes
      • Kafka Troubleshooting Guide
        • How to clean up Kafka logs
        • How to change or reset consumer offset in Kafka?
      • SRE Rituals
      • FAQs
        • I am unable to login to the citizen or employee portal. The UI shows a spinner.
        • My DSS dashboard is not reflecting accurate numbers? What can I do?
      • Deployment using helm
        • Helm installation:
        • Helm chart creation
        • Helm chart customization
      • How to Dump Elasticsearch Indexes
      • Deploy Nginx-Ingress-Controller
      • Deployment Job Pipeline Setup
      • OAuth2-Proxy Setup
      • Jira Ticket Creation
  • Reference
    • 👉Setup Basics
      • Setup Requirements
        • Tech Enablement Training - Essential Skills and Pre-requisites
        • Tech Enablement Training (eDCR) - Essential Skills and Prerequisites
          • Development Control Rules (Digit-DCR)
          • eDCR Approach Guide
        • DIGIT Rollout Program Governance
        • DevOps Skills Requirements
        • Infra Requirements
        • Team Composition for DIGIT Implementation
        • Infra Best Practices
        • Operational Best Practices
        • Why Kubernetes For DIGIT
      • Supported Clouds
        • Google Cloud
        • Azure
        • AWS
        • VSphere
        • SDC
      • Deployment - Key Concepts
        • Security Practices
        • CI/CD
        • Readiness & Liveness
        • Resource Requests & Limits
      • Understanding ERP Stack
        • ERP Monolithic Architecture
        • ERP Hybrid Architecture
        • ERP Coexistence Architecture
        • APMDP-HYBRID-INFRA ARCHITECTURE
        • eGov SmartCity eGovernance Suite
        • ERP Deployment Process
        • ERP Release Process
        • ERP User Guide
      • Deploying DIGIT Services
        • Deployment Architecture
        • Routing Traffic
        • Backbone Deployment
      • Troubleshooting
        • Distributed Tracing
        • Logging
        • Monitoring & Alerts
    • 📥Reference Reads
      • Analytics
      • DevSecOps
      • Low Code No Code
        • Application Specification
      • Beneficiary Eligibility
      • Government and Open Digital Platforms
      • Microservices and Low Code No Code
      • Registries
      • Platform Orientation - Overview
    • 🔏Data Security
      • Signed Data Audit
      • Encryption Techniques
      • Approaches to handle Encrypted Data
    • ❕Privacy
    • 🕹️DevOps
      • 1. How DNS works
      • 2. Load Balancer
      • 3. SSL/Cert-manager
      • 4.Ingress,WAF
      • 5.VPC
      • 6.Subnets
      • 7.EKS
      • 8.Worker Node Group
      • 9.RDS
      • 10.NAT
      • 11.Internet Gateway
      • 12.Block Storage (EBS Volumes)
      • 13.Object Storage (S3)
      • 14. Telemetry
Powered by GitBook

All content on this page by eGov Foundation is licensed under a Creative Commons Attribution 4.0 International License.

On this page
  • Overview
  • Use Cases
  • Concepts
  • Type & Search API
  • Scope & Limitations

Was this helpful?

  1. Platform
  2. Core Services
  3. NLP Engine Service

NLP Chatbot

Documentation on NLP Chatbot

Overview

NLP [Natural Language Processing] is a branch of artificial intelligence that deals with the interactions between humans and computers. The primary task is to interpret the intent of the user (speech or text) and provide the user with the appropriate output. Challenges in Natural language frequently involve speech recognition, audio transcription, natural language understanding and natural language generation.

In this project, we are trying to empower regular chatbots with NLP. This will significantly increase user convenience and in turn, help us to enhance our customer reach. On the other hand, this is a cost-efficient approach as the number of dialogues between the user and the chatbot would also reduce and there would be a huge saving on messaging charges.

Use Cases

This is one of the most insightful use cases of this project. Instead of bombarding the user with subsequent messages consisting of a menu to select from, we can just ask the user to enter his query. Using intent classification and entity recognition, the chatbot can return the appropriate output.

Another area where this concept can be used is complaint classification. Instead of visiting a link and selecting a complaint category from a huge list of complaint categories, the user would at any time prefer to just type in the complaint and leave it to the chatbot to identify the category of the complaint using NLP techniques.

The third use case is regarding the city recognition algorithm. In the existing version of the Punjab UAT chatbot, the process of user locality recognition is a bit inconvenient. The user needs to visit a link, select his/her city from a drop-down menu consisting of around 170 cities, and then return back to WhatsApp to continue the chat further. Using NLP, we can just ask the user to enter his city name and we can detect the user's location using NLP techniques.

Concepts

  • Preparing a virtual dataset

One of the major challenges in this project was the absence of a real-time dataset, which is of utmost importance in any NLP-based project. The first phase of the project consisted of recognizing the user intent and classifying whether the user wants to pay bills or he simply wants to retrieve receipts. The idea was to exploit the fact that the inputs corresponding to the intent of paying bills, on average have a certain number of past tense verbs and present tense verbs in the sentence. The same goes for the intent of retrieving the paid bills. The tense of the words was figured out using POS tagging and tokenization in-built functions in nltk.

The sentences present in the brown corpus of the nltk library were segregated into two files namely ‘paid.txt’ and ‘unpaid.txt’ The criteria used for segregation were as follows:

A. If a certain sentence consists of one or more words synonymous with ‘paid’ in its past tense, the sentence is added to ‘paid.txt’.

B. If the sentence consists of one or more words synonymous with ‘paid’ but in the present tense, the sentence is added to ‘unpaid.txt’.

C. If a certain sentence consists of one or more words synonymous with ‘unpaid’ in its past tense, the sentence is added to ‘unpaid.txt’.

D. If a certain sentence consists of one or more words synonymous with ‘unpaid’ in its present tense, the sentence is added to ‘paid.txt’.

Thus, we had our virtual dataset ready. Sentences consisting of negative words such as ‘not’, ‘non’ etc were taken care of in the n-gram analysis, which is explained in the subsequent parts of the documentation.

  • Classification:

The classifier used here is the Decision Tree Classifier available in the nltk library. ‘paid.txt’ and ‘unpaid.txt’ files were used as training data sets for the model. The following have been used as training features:

1. The number of simple past tense verbs in the sentence.

2. The number of past perfect tense verbs in the sentence.

3. The number of simple present tense verbs in the sentence.

4. The number of past perfect continuous tense verbs in the sentence.

5. The number of present perfect tense verbs in the sentence.

6. The number of words synonymous with ‘paid’.

7. The number of words synonymous with ‘unpaid’.

The classifier was then stored in a pickle file so that the classifier is not trained from scratch every time the program is executed, and hence a lot of compilation time is saved. The run time however remains the same.

  • Text pre-processing

The following refinements are carried out on the text before sending it through the classifier.

  1. Remove all the punctuations in the sentence, or convert them into a string if possible. For example, convert n’t to not. Other punctuations such as , . / ? ; etc can be removed safely from the string.

  2. Remove stop-words from the sentence. Stop-words in nltk are a list of trivial words like ‘I’, ‘me’, ‘myself’, ‘and’, ‘or’, ‘not’ etc. However, the words ‘are’, ‘to’, ‘be’ and ‘not’ are not removed from the sentence as they are useful in n-gram analysis.

  3. If any character in the sentence appears consecutively more than two times, then remove the extra occurrences and limit the consecutive characters to two. For example, if there is ‘iiii’ present in the string, then remove the extra i’s and convert it to ‘ii’. This is done based on the fact that there are hardly any words in the English vocabulary where there are more than two consecutive repeating characters. Moreover, this also helps to remove the extra spaces in the input.

  4. Convert the entire sentence to lowercase. As we know, Python is case-sensitive and so are the functions in many of the libraries.

The latter half of the text pre-processing comprises Fuzzy String Matching.

This is accomplished using the ratio function in the fuzzy-wuzzy library of Python. Each word in the sentence is checked against each word in the master list. (the master list is a list containing all the keywords used for training the classifier). If the match ratio is above a certain threshold, the word in the sentence is replaced by the word from the master list. Else, the word is replaced by its closest matching word from the English vocabulary.

  • Intent Recognition

After text pre-processing, the input sentence is checked for words matching with synonyms of ‘quit’. If such a word is found, then the program assumes the user to have entered the bill payment section mistakenly, and the program exits. Else, the program continues to be executed.

The input sentence is then checked for n-grams. N-grams are nothing but a set of n consecutive words in a sentence. For example, the list of 2-grams in the sentence “I want to pay property taxes” will look like [(‘I’, ‘want’), (‘want’, ‘to’), (‘to’, ‘pay’), (‘pay’, ‘property’), (‘property’, ‘taxes’)]. Some of the 2-grams considered for analysis are [ 'not paid', 'not complete', 'not done', 'not given']. Some of the 3-grams considered are ['not to be paid', 'not to be given']. It has been observed that the n-grams dominate the user intent more than the tenses of the verbs. Hence, the n-gram analysis is done prior to classification. If the text does not contain any relevant n-grams, the text is then classified using the Decision Tree Classifier.

  • Entity Recognition

Entity recognition is all about finding the entity mentioned in the user’s input statement. For example, in the statement “I want to pay property taxes”, the entity mentioned is ‘property’. At this point, we are dealing with three entities, namely ‘water and sewerage’, ‘property’, ‘and trade license’. Every word in the user's input is checked with every word in the list of entities. If the fuzzy match ratio between these words is above a certain threshold, the entity is marked. Then, out of all the marked entities, the entity with the highest fuzz ratio is chosen. For example, if the threshold is 60 and the fuzzy match ratios for the entities are as follows:

Water and sewerage = 65 Property= 75

Trade license= 70

Then ‘Property’ will be identified as the entity which the user is talking about. If none of the entities has a match ratio above the threshold, then the program prompts the user to re-enter the entity. For example, if the threshold in the above example is set to 80, then we see that none of the entities actually has a match ratio above 80 and hence the user is prompted to re-enter his input. The concept of fuzzy match ratio is used for dealing with spelling mistakes in the user’s input.

  • Audio transcription

Audio transcription is the process of converting an audio file to written text. Several pre-trained models are available in various languages in Python for audio transcription. In this project, I have used pydub and speech recognition libraries. The audio message sent on WhatsApp is transferred in the form of .ogg files. However, the Python libraries in use accept only .wav files. So, initially, we convert the received .ogg files to .wav files. The second important step is Audio Segmentation. Audio Segmentation is the process of breaking the audio files into chunks based on the pause between two words in the sentence. This helps to analyze the meaning of the sentence precisely and increases the accuracy of the system. In this way, the audio message is converted to plain text and further analysis is carried out in the same way as mentioned above.

Bill payment chatbot working:

Complaint classification chatbot working:

Type & Search API

In the existing version of the chatbot, the user has to select his/her city from a drop-down menu by visiting the mSeva website. This significantly reduces user convenience as the user is required to constantly switch pages. Thus, we have implemented an algorithm which uses fuzzy matching and pattern recognition to recognize the city input by the user. A list comprising all the city names in English, Punjabi and Hindi was used as a reference tool for this project. Based on the user input, the cities having the highest match ratio with the input are being returned as the output list to select from. The user should then enter the option number of his /her desired city. If the user does not find their city in the list, he/she can go back to the main menu and start all over again. This API works in Hindi and Punjabi as well. A one-time integration was done with WhatsApp using the GupShup platform. Here are some snapshots of the type and search API.

Scope & Limitations

As evident from the above discussions, we are now able to:

  1. Classify user intent and recognize entities for facilitating bill payments/ receipt retrievals in multiple languages.

  2. Categorize user complaints into appropriate categories based on the keywords present in the user input.

  3. Locate the user city by using fuzzy string matching and pattern recognition requiring minimum efforts from the user’s side and the bot being implemented in English, Hindi and Punjabi as well.

  4. Respond appropriately to audio messages from the user regarding bill payment queries.

However, the project has some limitations too. They are as follows:

  1. Due to the scarcity of data, rigorous testing could not be performed and tangible accuracy metrics could not be produced.

  2. The city recognition algorithm works on the basis of the information provided by the MDMS database. However, there are some anomalies in the data and some missing cities as well. Due to these reasons, the algorithm is not yet foolproof.

  3. The audio transcription feature could not be implemented in regional languages due to some limitations in Python modules such as scarce data corpus in regional languages and a lack of pre-trained models in Indian languages.

PreviousNLP Engine ServiceNextSMS Template Approval Process

Last updated 1 year ago

Was this helpful?

🔧