Review Analysis System

The project involves analyzing product reviews from the G2 platform to extract and identify key features mentioned by users. The aim is to provide insights into what customers are saying about various products and services.

The G2 Product Feature Analysis project offers both a library for seamless integration into existing workflows and a GUI-based Streamlit web application for intuitive visualization.

The library provides easy-to-use functions for extracting and analyzing product features from user reviews, utilizing Natural Language Processing (NLP) techniques and machine learning algorithms. It allows developers to seamlessly integrate feature analysis capabilities into their applications.

Additionally, the Streamlit web application offers a user-friendly interface for visualizing the results of feature analysis. Users can interactively explore the top features extracted from the reviews and gain insights into the most prominent aspects of their products.

Together, these components enable efficient and insightful analysis of product features, empowering businesses to make data-driven decisions to enhance their offerings.

Results Screenshots

Word clouds generated from product reviews help identify the most prominent features or aspects mentioned by users, facilitating feature analysis and understanding customer sentiment towards specific product attributes.

Streamlit Application

Demo

Methodology

Run Locally

Clone the project

  https://github.com/KumarSampurn/G2-Hackathon

Go to the project directory

  cd G2-Hackathon

Environment Variables : Create a .env file with your G2 API token

Install dependencies

  pip install -r requirements.txt

Start the server

  streamlit run app.py

Features

Feature Analysis: Analyzes product features mentioned in reviews and identifies their importance using clustering algorithms.
Geographic Analysis: Filters reviews based on the specified country name and analyzes the features mentioned.
Time Series Analysis: Filters reviews based on the specified date and analyzes the features mentioned over time.

Usage

Alternatively, you can explore and run the provided Jupyter notebooks:

g2-hackathon.ipynb: Works with the entire dataset.
g2-hackathon-lite.ipynb : Works with only 100 reviews.

API Reference

G2 API

Endpoint: https://data.g2.com/api/v1/survey-responses
Authentication: Token-based authentication using the API key.

Parameter	Type	Description
`api_key`	`string`	Required. Your G2 API key.
`number_of_reviews`	`integer`	Number of reviews to fetch (default is 10).
`page_num`	`integer`	Page number of reviews to fetch (default is 1).

Hugging Face API

Model: Mistral-7B-Instruct-v0.2
Authentication: Bearer token authentication.

Parameter	Type	Description
`inputs`	`string`	Required. The review text to analyze.
`temperature`	`float`	Control the randomness of the outputs (default is 0.9).

Methods

In the dump folder, various failed attempts are documented:

Initially, data analytics techniques were employed to extract features, but encountered numerous false positives.
Subsequently, NLP techniques were explored, but proved ineffective in extracting features accurately.
Recognizing the need for contextual attention models, integration of Large Language Models (LLMs) commenced. Initially, models such as Phi Dolphin Phi and Orca Mini (2B parameter models) were tested but yielded unsatisfactory results.
Transitioning to more powerful models, attempts were made with Gemma and Mistral (7B parameter models), leading to system crashes. Consequently, Mistral 7B v2 model API was adopted for the project.

Direct transmission of data to the Mistral API was not viable due to sensitive information such as company and personal names. As a solution, Spacy library was utilized for name imputation, followed by extraction of user-preferred features. Post preprocessing, TFIDF vectorization segmented the features into clusters based on semantic similarity. Subsequently, feature importance was determined for each cluster using term frequency and normalization techniques. The top features from each cluster were selected as representative features.

Tech Stack

Python: Core programming language for development.
Streamlit: Frontend framework used for building interactive web applications.
Requests: HTTP library for making API requests.
Spacy: Natural Language Processing (NLP) library for text processing.
NLTK: NLP library for tokenization, stemming, and other text processing tasks.
Scikit-learn: Machine learning library for clustering and feature extraction.
Matplotlib: Data visualization library for creating plots and charts.
WordCloud: Library for generating word clouds based on text data.
dotenv: Library for loading environment variables from a .env file.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
dump		dump
images		images
src		src
.env		.env
G2-Hackathon-lite.ipynb		G2-Hackathon-lite.ipynb
G2-Hackathon.ipynb		G2-Hackathon.ipynb
G2-library.ipynb		G2-library.ipynb
G2lib.py		G2lib.py
Readme.md		Readme.md
app.py		app.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Review Analysis System

Results Screenshots

Run Locally

Features

Usage

API Reference

G2 API

Hugging Face API

Methods

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Languages

KumarSampurn/G2-Hackathon

Folders and files

Latest commit

History

Repository files navigation

Review Analysis System

Results Screenshots

Run Locally

Features

Usage

API Reference

G2 API

Hugging Face API

Methods

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages