Skip to content

kamjula/enterprise-data-detective

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

🕵️ Enterprise Data Detective

A Python tool I built to profile enterprise transaction data, detect fraud and anomalies, and let you ask questions about your data in plain English — without needing a data warehouse.


✨ What It Does

Feature Description
📊 Data Profiling Automatically reports rows, columns, types, missing values, duplicates, and date range from any CSV
🚨 Anomaly / Fraud Detection Flags suspicious transactions using 6 rule-based detectors — each with a plain-English reason
💬 Natural-Language Q&A Ask "which 5 accounts spent the most?" → the tool generates SQL → DuckDB runs it → returns a table

🚨 Fraud Detection Rules

  • Amount outliers — transactions far above the statistical norm (mean + 3σ)
  • Large round numbers — suspiciously clean amounts like $5,000 or $10,000
  • Velocity bursts — same account with 3+ transactions within 1 hour
  • Duplicate charges — same account + amount + day, charged twice
  • High-risk countries — transactions from known fraud-prone regions
  • Off-hours activity — transactions between 11 PM and 5 AM

🛠 Tech Stack

Python · Pandas · DuckDB · OpenAI API · Streamlit · Statistical Anomaly Detection

Layer Technology
Data handling Python + Pandas
In-process SQL DuckDB (no warehouse needed)
Natural language Q&A OpenAI API
Web UI Streamlit

📁 Project Structure

enterprise-data-detective/
├── data/
│   ├── generate_sample_data.py
│   └── transactions.csv
├── detective/
│   ├── profile_data.py
│   ├── detect_anomalies.py
│   └── ask_data.py
├── app.py
└── requirements.txt

🚀 How to Run

pip install -r requirements.txt
python data/generate_sample_data.py
python detective/detect_anomalies.py data/transactions.csv
streamlit run app.py

For the natural language Q&A feature, set your API key:

export OPENAI_API_KEY=your-key-here

💡 Why I Built This

I wanted to build something that works the way analysts actually think — profile the data first, find what's wrong, then answer questions about it in plain English instead of writing SQL every time.

📌 The sample data is synthetic — generated by the script in this repo with suspicious transactions deliberately injected. Not real customer data.


👩‍💻 Author

Sravani Kamjula | Data Analyst   LinkedIn · Portfolio

About

AI-powered tool to profile, audit, and query enterprise transaction data — flags suspicious transactions and answers questions in plain English using Python, SQL (DuckDB), and the Claude API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages