top of page

Ideal County for New Barbershop in MN

Timeframe

1 week

Tools

Microsoft Excel, Python

Role

Data Analyst: Web-scraped and analyzed data using Python and Excel.
Dashboard Developer: Created a dashboard in Excel to present findings.

Objectives:

-Showcase my web scraping and data analysis skills by helping a client identify the best county in Minnesota to open a new barbershop.

-The analysis focused on per capita income, median household income, median family income, and population size to provide data-driven insights for informed decision-making.

Problem

The client needed to identify a suitable county in Minnesota to establish a new barbershop. Key factors included per capita income, median household income, median family income, and population size. The challenge was to extract, analyze, and visualize this data to find the best location.

Solution

Web Scraping:

-Extracted data from the Wikipedia page: https://en.wikipedia.org/wiki/List_of_Minnesota_locations_by_per_capita_income using Python (BeautifulSoup and requests libraries).

-Processed and cleaned the data for analysis.

Data Analysis:

-Imported the cleaned data into Excel.

-Created pivot tables and charts to correlate per capita income with median household income, median family income, and population size.

PROCESS
Data Collection
Data Cleaning
Feature Engineering
Dashboard Creation
Data
Collection
Python

For this project, I used Python to scrape data from a Wikipedia page on per capita income for various locations in Minnesota. The web scraping process involved using libraries such as BeautifulSoup and requests to collect and structure the data, which was then analyzed in Excel.

 

I designed my web scraping script to accurately process the HTML content, ensuring that the collected data was clean and ready for analysis. This careful approach allowed for precise insights to support the decision-making process.

screencapture-file-Users-erickphasy1-Downloads-Barbershop-html-2024-07-18-14_01_53.png

Here is a screenshot of my Jupyter Notebook used for web scraping and data analysis. This notebook demonstrates the process of extracting data from a Wikipedia page listing per capita income for Minnesota locations, and structuring the data using Python libraries.

Description

Import Libraries: Imported necessary libraries including requests, BeautifulSoup, and pandas.

Fetch Web Page: Sent a request to the Wikipedia page to fetch the HTML content.

Parse HTML Content: Used BeautifulSoup to parse the HTML content and locate the relevant table.

Extract Data: Extracted table headers and rows, cleaned the data, and stored it in a pandas DataFrame.

Print Data: Printed the extracted data for verification.

Save Data: Saved the cleaned data to a CSV file for further analysis.

The notebook successfully demonstrates the ability to perform web scraping and data analysis, providing data-driven insights to support decision-making processes.

Data
Cleaning
Excel

After collecting the data using Python, the next step was to clean in Excel to ensure accuracy and readability for analysis and dashboard creation.

Screenshot 2024-07-22 at 2.37.38 PM.png

Data Cleaning Steps:

Remove Duplicates: Checked for and removed any duplicate entries to ensure each county was represented only once.​

Rename Columns to Camel Case: Changed column names to camelCase for consistency and ease of use in the dashboard. This also helps avoid naming conflicts and makes the data more readable.

Remove Non-County Rows: Identified and removed three rows that were not actual counties, ensuring only relevant data was included.

Adjust Formatting: Adjusted the formatting to ensure consistency, such as aligning text and removing unnecessary spaces.

What I Learned

Correcting spelling errors and applying naming conventions improve data consistency, reducing confusion and ensuring uniformity.

Accurate data review and classification are vital for effective data analysis and visualization.

Feature engineering and preparing data for visualization are crucial steps in solving business problems and enhancing analysis.

Feature Engineering 
Excel

After cleaning the data, I added an Income Category column to classify counties based on median household income:

  • High: >$100,000

  • Average: $50,000-$100,000

  • Low: <$50,000

Screenshot 2024-07-22 at 3.33.26 PM.png

Analysis

-Identified the top 10 counties based on per capita income to focus on regions with higher individual prosperity.

-Evaluated these counties within the context of the Income Category to see how many fall into the Average bracket.

Comparative Analysis:

Used the new Income Category column to compare economic indicators across different categories, focusing on counties that fall into the Average income category but have high per capita income to identify potential locations with balanced economic prosperity.

What I Learned

This enhanced data was crucial for creating insightful visualizations and making informed recommendations in the dashboard. By categorizing counties and focusing on those within the Average income bracket with high per capita income, we could pinpoint locations with potential for balanced economic growth and suitability for the new barbershop.

Dashboard Creation
Excel

To visualize the economic indicators, I created a dashboard. The dashboard provides a clear, comparative view of the counties, highlighting those with the most favorable economic conditions for establishing a new barbershop.

What I Learned from Creating the Dashboard

Data Cleaning: Ensuring accurate and clean data is crucial for reliable analysis and visualization.

Text Hierarchy: Organizing information with clear headings and subheadings improves readability.

Accessibility: Choosing color schemes and fonts that comply with WCAG guidelines makes the dashboard usable for a wider audience.

Effective Visualization: Selecting appropriate charts, like heatmaps, to clearly represent complex data.

Attention to Detail: Meticulous formatting and consistent styling enhance the overall clarity and professionalism of the dashboard.

REFLECTION

What I Learned So Far

Versatility in Tools

Whether using Excel, Tableau, or any other data visualization tool, it's crucial to ensure the dashboard is clear and easily interpretable. The goal is to convey insights effectively, regardless of the platform used.

Importance of Clean Data:

Accurate data cleaning is fundamental for reliable analysis and visualization. Ensuring data consistency, removing duplicates, and correctly categorizing information are essential steps that directly impact the quality of the analysis.

Effective Visualization Techniques:

Selecting appropriate visualization methods, such as heatmaps for complex data, enhances understanding and decision-making. Clear and accessible visualizations are critical for communicating findings to stakeholders.

Web Scraping Efficiency:

Using web scraping techniques to collect data can be highly efficient, but it requires careful planning and validation to ensure the data's accuracy and relevance. This project reinforced the value of web scraping as a powerful tool for data collection when traditional methods are not feasible.

What I Would Have Done Differently

In hindsight, I realized that dedicating each page or sheet to a single graph would have been more effective. Consolidating all visualizations onto one sheet made it congested and harder to interpret. By spreading the graphs across multiple sheets, I could have enhanced clarity and focus for each visualization, making the dashboard more user-friendly.

MOVING FORWARD

Expanding Data Collection and Analysis

Having identified the optimal counties for a new barbershop based on economic indicators, the next steps involve deeper analysis within these regions. We should consider additional factors such as the cost of building a barbershop, local regulations, and market competition. This comprehensive approach will provide a more holistic view, ensuring well-informed decision-making for the client.

By focusing on these aspects, we can further refine our recommendations and support the client's goal of establishing a successful barbershop in the most suitable county.

© 2021 Erick Phasy Portfolio 
bottom of page