Build A Flask App For Data Visualization
Hey guys! Let's dive into creating a cool Flask app that processes and visualizes data. We'll be working with JSONL files, building a dashboard, interactive charts, and even adding a search feature. This project covers data loading, display, and visualization β perfect for getting hands-on with web app development and data analysis. I will show you how to generate a flask app to process and visualize the data, and you will learn about the required libraries, how to load the data, build the homepage, create detail pages, and visualize the data using interactive charts.
1. Project Overview: What We're Building
So, what's the plan? We're building a Flask-based web application. It will load data from four JSONL files, allowing users to explore and visualize the data through interactive charts and tables. Here's a quick look at the main features:
- Data Loading: Load all the data from JSONL files into Python data structures at startup.
- Homepage: A dashboard with a summary of the data and links to different datasets.
- Dataset Detail Pages: Detailed views for each record, with links to related data.
- Visualization Page: Interactive charts to visualize the data, with filtering options.
- Tech Stack: Flask, Jinja2, Bootstrap, and optionally DataTables.js for client-side search.
Sounds fun, right? Let's get started!
2. Setting Up Your Project: Requirements and Structure
First things first, we need to set up our project. Here's how to structure your files and install the necessary libraries:
2.1 Project Structure
Create the following directory structure:
my_app/
βββ app.py
βββ requirements.txt
βββ templates/
β βββ index.html
β βββ dataset_detail.html
β βββ visualize.html
βββ static/
β βββ css/
β β βββ style.css
β βββ js/
β βββ scripts.js
βββ data/
β βββ pr_issue_single.jsonl
β βββ pr_detail.jsonl
β βββ issue_detail.jsonl
β βββ merged_data.jsonl
βββ README.md
2.2 requirements.txt
Create a requirements.txt file to manage our project dependencies. Include the following libraries:
Flask
Jinja2
Bootstrap
plotly
# Optional
# Flask-Caching
Install the dependencies using pip:
pip install -r requirements.txt
2.3 README.md
Create a README.md file to describe the project. This should include project goals, instructions on how to set up and run the app, and any additional notes. It's a good practice to keep the documentation updated as you develop.
# Flask Data Visualization App
## Description
A Flask-based web application to load, process, and visualize data from JSONL files.
## Features
* Data loading and storage.
* Dashboard with dataset summaries.
* Dataset detail pages.
* Interactive visualizations using Plotly.
* Client-side table search with DataTables.js (optional).
* JSON API endpoints for data.
## Prerequisites
* Python 3.x
* pip
## Installation
1. Clone the repository (or create your own).
2. Create a virtual environment: `python -m venv venv`
3. Activate the environment: `source venv/bin/activate` (Linux/macOS) or `venv\Scripts\activate` (Windows)
4. Install dependencies: `pip install -r requirements.txt`
## Running the App
1. Run the Flask app: `python app.py`
2. Open your browser and go to `http://127.0.0.1:5000/`
## Technologies Used
* Flask
* Jinja2
* Bootstrap
* Plotly
* DataTables.js (optional)
## Files
* `app.py`: The main Flask application file.
* `requirements.txt`: Project dependencies.
* `templates/`: HTML templates.
* `static/`: Static files (CSS, JavaScript).
* `data/`: JSONL data files.
## Future Enhancements
* Implement user authentication.
* Add more interactive charts.
* Improve error handling.
## Contributing
Feel free to contribute to the project by submitting pull requests or opening issues.
## License
[Your License]
## Contact
[Your Contact Information]
3. Core Implementation: app.py
Let's get into the heart of the app β the app.py file. This is where we'll handle data loading, routing, and all the logic of our application.
# app.py
from flask import Flask, render_template, request, jsonify
import json
import plotly
import plotly.express as px
from collections import defaultdict
app = Flask(__name__)
# --- Data Loading ---
# Load JSONL data into dictionaries
def load_data(file_paths):
data = {}
for name, path in file_paths.items():
try:
with open(path, 'r') as f:
data[name] = [json.loads(line) for line in f]
except FileNotFoundError:
print(f"File not found: {path}")
data[name] = [] # Or handle the error as needed
return data
# File paths - update these with your actual paths if needed
file_paths = {
'pr_issue': 'data/pr_issue_single.jsonl',
'pr_detail': 'data/pr_detail.jsonl',
'issue_detail': 'data/issue_detail.jsonl',
'merged_data': 'data/merged_data.jsonl'
}
data = load_data(file_paths)
# --- Helper Functions ---
def get_record_by_id(dataset_name, record_id):
if dataset_name in data:
for record in data[dataset_name]:
if str(record.get('id', record.get('issue_number', record.get('pull_number'))) ) == record_id:
return record
return None
# --- Routes ---
@app.route('/')
def index():
dataset_summaries = {
name: len(records)
for name, records in data.items()
}
return render_template('index.html', dataset_summaries=dataset_summaries)
@app.route('/dataset/<dataset_name>/<record_id>')
def dataset_detail(dataset_name, record_id):
record = get_record_by_id(dataset_name, record_id)
related_data = {}
if record:
if dataset_name == 'pr_detail':
issue_number = record.get('issue_number')
if issue_number:
related_issue = get_record_by_id('issue_detail', str(issue_number))
if related_issue:
related_data['issue'] = related_issue
elif dataset_name == 'issue_detail':
pull_requests = [pr for pr in data['pr_detail'] if pr.get('issue_number') == record.get('id')]
if pull_requests:
related_data['pull_requests'] = pull_requests
elif dataset_name == 'merged_data':
pr_issue_id = record.get('pr_issue_id')
pr_id = record.get('pull_number')
issue_id = record.get('issue_number')
if pr_issue_id:
related_pr_issue = get_record_by_id('pr_issue', str(pr_issue_id))
if related_pr_issue:
related_data['pr_issue'] = related_pr_issue
if pr_id:
related_pr = get_record_by_id('pr_detail', str(pr_id))
if related_pr:
related_data['pr'] = related_pr
if issue_id:
related_issue = get_record_by_id('issue_detail', str(issue_id))
if related_issue:
related_data['issue'] = related_issue
return render_template('dataset_detail.html', dataset_name=dataset_name, record=record, related_data=related_data)
@app.route('/visualize')
def visualize():
repo_filter = request.args.get('repo')
graphs = {}
# Merged Data Visualizations
df_merged = data['merged_data']
if df_merged:
# Top 10 Repos by PR Count
df_merged_repo_counts = defaultdict(int)
for item in df_merged:
df_merged_repo_counts[item.get('repo', 'N/A')] += 1
sorted_repo_counts = sorted(df_merged_repo_counts.items(), key=lambda item: item[1], reverse=True)[:10]
df_merged_top_repos = pd.DataFrame(sorted_repo_counts, columns=['repo', 'count'])
if repo_filter:
df_merged_top_repos = df_merged_top_repos[df_merged_top_repos['repo'] == repo_filter]
fig_top_repos = px.bar(df_merged_top_repos, x='repo', y='count', title='Top 10 Repos by PR Count')
graphs['top_repos'] = json.dumps(fig_top_repos, cls=plotly.utils.PlotlyJSONEncoder)
# Time Series of PR Creation Dates
df_merged_created_at = [item.get('created_at') for item in df_merged if item.get('created_at')]
if df_merged_created_at:
df_merged_time_series = pd.DataFrame(df_merged_created_at, columns=['created_at'])
df_merged_time_series['created_at'] = pd.to_datetime(df_merged_time_series['created_at'])
if repo_filter:
df_filtered = [item for item in df_merged if item.get('repo') == repo_filter]
df_merged_time_series = pd.DataFrame([item.get('created_at') for item in df_filtered if item.get('created_at')], columns=['created_at'])
df_merged_time_series['created_at'] = pd.to_datetime(df_merged_time_series['created_at'])
fig_time_series = px.line(df_merged_time_series, x='created_at', y=df_merged_time_series.index, title='PR Creation Dates Over Time')
graphs['time_series'] = json.dumps(fig_time_series, cls=plotly.utils.PlotlyJSONEncoder)
# pr_issue Visualizations
df_pr_issue = data['pr_issue']
if df_pr_issue:
df_pr_issue_counts = defaultdict(int)
for item in df_pr_issue:
df_pr_issue_counts[item.get('repo', 'N/A')] += item.get('closing_issue', 0)
df_pr_issue_counts_sorted = sorted(df_pr_issue_counts.items(), key=lambda item: item[1], reverse=True)[:10]
df_pr_issue_counts_df = pd.DataFrame(df_pr_issue_counts_sorted, columns=['repo', 'closing_issue_count'])
if repo_filter:
df_pr_issue_counts_df = df_pr_issue_counts_df[df_pr_issue_counts_df['repo'] == repo_filter]
fig_closing_issue = px.bar(df_pr_issue_counts_df, x='repo', y='closing_issue_count', title='Closing Issue Counts per Repo')
graphs['closing_issue'] = json.dumps(fig_closing_issue, cls=plotly.utils.PlotlyJSONEncoder)
# pr_detail Visualizations
df_pr_detail = data['pr_detail']
if df_pr_detail:
pr_dates = [item.get('created_at') for item in df_pr_detail if item.get('created_at')]
if pr_dates:
df_pr_detail_dates = pd.DataFrame(pr_dates, columns=['created_at'])
df_pr_detail_dates['created_at'] = pd.to_datetime(df_pr_detail_dates['created_at'])
if repo_filter:
df_pr_detail_filtered = [item for item in df_pr_detail if item.get('repo') == repo_filter]
pr_dates_filtered = [item.get('created_at') for item in df_pr_detail_filtered if item.get('created_at')]
df_pr_detail_dates = pd.DataFrame(pr_dates_filtered, columns=['created_at'])
df_pr_detail_dates['created_at'] = pd.to_datetime(df_pr_detail_dates['created_at'])
fig_pr_by_month = px.histogram(df_pr_detail_dates, x='created_at', title='PRs by Month/Year')
graphs['pr_by_month'] = json.dumps(fig_pr_by_month, cls=plotly.utils.PlotlyJSONEncoder)
# issue_detail Visualizations
df_issue_detail = data['issue_detail']
if df_issue_detail:
label_counts = defaultdict(int)
for item in df_issue_detail:
labels = item.get('labels', [])
for label in labels:
label_counts[label.get('name', 'N/A')] += 1
sorted_labels = sorted(label_counts.items(), key=lambda item: item[1], reverse=True)[:10]
df_labels = pd.DataFrame(sorted_labels, columns=['label', 'count'])
if repo_filter:
df_issue_detail_filtered = [item for item in df_issue_detail if item.get('repo') == repo_filter]
label_counts_filtered = defaultdict(int)
for item in df_issue_detail_filtered:
labels = item.get('labels', [])
for label in labels:
label_counts_filtered[label.get('name', 'N/A')] += 1
sorted_labels = sorted(label_counts_filtered.items(), key=lambda item: item[1], reverse=True)[:10]
df_labels = pd.DataFrame(sorted_labels, columns=['label', 'count'])
fig_top_labels = px.bar(df_labels, x='label', y='count', title='Top 10 Labels Used Across Issues')
graphs['top_labels'] = json.dumps(fig_top_labels, cls=plotly.utils.PlotlyJSONEncoder)
return render_template('visualize.html', graphs=graphs)
if __name__ == '__main__':
import pandas as pd # Import pandas here
app.run(debug=True)
3.1 Data Loading and Processing
The load_data function reads the JSONL files and stores the data in dictionaries. This is done at the start of the application. The get_record_by_id function helps to retrieve records by their ID or relevant keys.
3.2 Routing and Templates
We define the routes for the homepage (/), dataset detail pages (/dataset/<dataset_name>/<record_id>), and the visualization page (/visualize).
The index() function displays a summary of the datasets.
The dataset_detail() function retrieves a specific record and displays its details. It also handles related data by linking PRs to issues and vice versa.
The visualize() function generates interactive charts using Plotly. It includes a filter by repository name, allowing users to focus on specific data. Each chart is converted to JSON for rendering in the template.
4. Building the Templates: Jinja2 and Bootstrap
Now, let's create the HTML templates using Jinja2 and Bootstrap. We'll create three templates: index.html, dataset_detail.html, and visualize.html.
4.1 index.html
This template will display a dashboard with a summary of each dataset and links to explore each dataset.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Data Dashboard</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<div class="container">
<h1>Data Dashboard</h1>
<div class="row">
{% for name, count in dataset_summaries.items() %}
<div class="col-md-3">
<div class="card">
<div class="card-body">
<h5 class="card-title">{{ name.replace('_', ' ').title() }}</h5>
<p class="card-text">Total Records: {{ count }}</p>
<a href="#" class="btn btn-primary" onclick="showTable('{{ name }}')">View Data</a>
</div>
</div>
</div>
{% endfor %}
</div>
<div id="dataTablesContainer">
<!-- DataTables will be loaded here -->
</div>
<a href="/visualize" class="btn btn-secondary mt-3">Go to Visualizations</a>
</div>
<script src="https://code.jquery.com/jquery-3.5.1.slim.min.js"></script>
<script src="https://cdn.datatables.net/1.10.25/js/jquery.dataTables.min.js"></script>
<link rel="stylesheet" href="https://cdn.datatables.net/1.10.25/css/jquery.dataTables.min.css">
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
<script src="/static/js/scripts.js"></script>
</body>
</html>
4.2 dataset_detail.html
This template will display detailed information about a single record, including any related data.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Record Detail</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<div class="container">
<h1>{{ dataset_name.replace('_', ' ').title() }} Detail</h1>
{% if record %}
<table class="table table-bordered">
<tbody>
{% for key, value in record.items() %}
<tr>
<th>{{ key.replace('_', ' ').title() }}</th>
<td>{{ value if value else 'N/A' }}</td>
</tr>
{% endfor %}
</tbody>
</table>
<hr>
<h2>Related Data</h2>
{% if related_data %}
{% for relation_type, related_records in related_data.items() %}
<h3>{{ relation_type.replace('_', ' ').title() }}</h3>
<ul class="list-group">
{% if relation_type == 'issue' or relation_type == 'pr' %}
{% set record = related_records %}
<li class="list-group-item">
<a href="/dataset/{{ 'issue_detail' if relation_type == 'issue' else 'pr_detail' }}/{{ record.get('id', record.get('issue_number', record.get('pull_number'))) }}">
{{ record.get('title', record.get('title', 'No Title')) if relation_type == 'issue' else record.get('title', 'No Title') }}
</a>
</li>
{% elif relation_type == 'pull_requests' %}
{% for record in related_records %}
<li class="list-group-item">
<a href="/dataset/pr_detail/{{ record.get('id') }}">
{{ record.get('title', 'No Title') }}
</a>
</li>
{% endfor %}
{% elif relation_type == 'pr_issue' %}
{% set record = related_records %}
<li class="list-group-item">
<a href="/dataset/pr_issue/{{ record.get('id') }}">
{{ record.get('title', 'No Title') }}
</a>
</li>
{% endfor %}
</ul>
{% endfor %}
{% else %}
<p>No related data found.</p>
{% endif %}
{% else %}
<p>Record not found.</p>
{% endif %}
<a href="/" class="btn btn-secondary">Back to Dashboard</a>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
</body>
</html>
4.3 visualize.html
This template will hold the interactive charts. We'll use Plotly.js to render the charts.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Data Visualization</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
</head>
<body>
<div class="container">
<h1>Data Visualizations</h1>
<form method="GET" action="/visualize">
<div class="form-group">
<label for="repo">Filter by Repository:</label>
<input type="text" class="form-control" id="repo" name="repo" value="{{ request.args.get('repo', '') }}">
</div>
<button type="submit" class="btn btn-primary">Filter</button>
</form>
{% for graph_name, graph_json in graphs.items() %}
<h2>{{ graph_name.replace('_', ' ').title() }}</h2>
<div id="{{ graph_name }}" class="plotly-graph"></div>
<script>
var data = {{ graph_json | safe }};
Plotly.newPlot('{{ graph_name }}', data.data, data.layout);
</script>
{% endfor %}
<a href="/" class="btn btn-secondary mt-3">Back to Dashboard</a>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/js/bootstrap.min.js"></script>
</body>
</html>
5. Adding the finishing touches: Static Files
Let's add some CSS to style our pages and create a JavaScript file for any client-side functionalities.
5.1 static/css/style.css
Here's a simple CSS file to style our app. This will help make the app visually appealing and user-friendly.
.plotly-graph {
width: 100%;
margin-bottom: 20px;
}
5.2 static/js/scripts.js
This file can hold any JavaScript code, such as functions to show/hide tables. Below is a sample with DataTables.
// static/js/scripts.js
function showTable(datasetName) {
// Implement DataTables initialization and data loading here
}
6. Running the App and Testing
After setting up the files, run the app using python app.py. Open your web browser and go to http://127.0.0.1:5000/. You should see the dashboard with dataset summaries. Click on the links to view the details and visualizations. Use the filter in the visualization page to test the functionality.
7. Next Steps and Enhancements
This app provides a basic structure for data visualization. You can extend it further with features like:
- Implement client-side table search with DataTables.js.
- Add a JSON API endpoint for each dataset (
/api/<dataset_name>) to return filtered data for chart rendering. - Cache computed aggregations so charts load quickly.
- Improve error handling and add data validation.
- Enhance the UI with more Bootstrap components.
And there you have it! You've successfully built a Flask application to process and visualize data. Feel free to modify and expand upon this foundation. Happy coding!
I hope this step-by-step guide helps you to generate a flask app to process and visualize the data. If you have any questions or need further assistance, don't hesitate to ask! Happy coding!