Learn HTML, CSS and JavaScript step by step — free web development tutorials for beginners in Pakistan and worldwide.
Get link
Facebook
X
Pinterest
Email
Other Apps
Data Analytics Complete Course (2025–26): Beginner to Pro Script
Take a full data analytics course from beginner to expert. The course includes structured lessons, tools, and real projects. You will master data skills in 2025–26.
Data Analytics Complete Course (2025–26)
Course Overview and Learning Objectives
What is Data Analytics?
Definition:
Data Analytics is the practice of examining raw data to discover patterns, draw conclusions, and support decision-making. It combines statistics, programming, and business acumen to transform data into actionable insights.
Course Structure
This course will take you from a complete beginner to a job-ready data analyst in 12 weeks. You will learn through structured lessons, hands-on projects, and real-world applications.
Prerequisite: Basic computer literacy. No prior analytics or programming experience required.
Learning Outcomes
By completing this course, you will be able to:
Understand data analytics fundamentals and concepts
Use Excel for data analysis and visualization
Write SQL queries to extract and transform data
Use Python for data manipulation and analysis
Create compelling data visualizations
Develop analytics dashboards
Conduct exploratory data analysis
Build predictive models
Present insights to stakeholders
Execute end-to-end analytics projects
WEEK 1-2: FOUNDATIONS OF DATA ANALYTICS
Module 1: Data Analytics Fundamentals
Lesson 1.1: Introduction to Data Analytics
Key Concepts:
Data
Raw facts and observations. Examples: Sales transactions, customer demographics, website clicks.
Information
Data processed into meaningful context. Example: "Sales increased 15% in Q3 among 25-35 age group in North India."
Insights
Conclusions drawn from information that drive decisions. Example: "We should target 25-35 age group with Q4 campaign in North India for maximum ROI."
The Analytics Process:
Raw Data → Collection → Cleaning → Analysis → Visualization →
Insights → Action → Business Impact
Lesson 1.2: Types of Analytics
Descriptive Analytics - "What Happened?"
Analyzing historical data to understand patterns.
Example: "Average monthly sales over past 12 months"
Tools: Excel, Power BI, Tableau
Outcome: Reports, dashboards, summaries
Diagnostic Analytics - "Why Did It Happen?"
Investigating reasons behind outcomes.
Example: "Why did sales increase 15% in Q3?"
Techniques: Correlation analysis, trend analysis, root cause analysis
Tools: SQL, Python, statistical software
Predictive Analytics - "What Will Happen?"
Using historical data to forecast future outcomes.
Example: "Predict next month's sales based on historical trends"
Techniques: Machine learning, regression, time series forecasting
Tools: Python, R, specialized ML platforms
Prescriptive Analytics - "What Should We Do?"
Recommending actions to achieve desired outcomes.
Example: "Allocate 40% budget to North India, 35% to South, 25% to East for maximum projected revenue"
=SUM(A1:A10) // Add values
=AVERAGE(A1:A10) // Calculate average
=COUNT(A1:A10) // Count cells with numbers
=MIN(A1:A10) // Find minimum
=MAX(A1:A10) // Find maximum
Logical Functions
=IF(A1>100, "High", "Low") // Conditional logic
=AND(A1>50, B1<100) // All conditions true?
=OR(A1=1, A1=2, A1=3) // Any condition true?
Text Functions
=LEN(A1) // Length of text
=UPPER(A1) // Convert to uppercase
=CONCATENATE(A1, " ", B1) // Combine text
=LEFT(A1, 3) // First 3 characters
Lesson 4.3: Data Cleaning
Common Data Quality Issues:
Missing Values
Identify: Look for blank cells
Handle: Delete, fill with mean/median, leave as is depending on context
Duplicate Records
Identify: Sort and visually inspect, or Data > Remove Duplicates
Handle: Remove duplicates or investigate root cause
Inconsistent Formatting
Standardize text cases (UPPER, LOWER, PROPER functions)
Consistent date formats
Consistent number formats
Outliers
Identify: Values significantly different from others
Verify: Confirm if genuine or data error
Handle: Keep if genuine, remove if error
Module 5: Pivot Tables and Summarization
Lesson 5.1: Creating Pivot Tables
What is a Pivot Table?
Summarizes large datasets into meaningful insights. Rotates (pivots) data dimensions.
Example Scenario:
Raw data: Sales transactions with Date, Region, Product, Amount
Pivot table shows: Revenue by Region and Product
Steps to Create:
Select data range
Insert > Pivot Table
Drag fields to Rows, Columns, Values
Customize as needed
Lesson 5.2: Aggregation Functions in Pivot Tables
SUM: Total of values
AVERAGE: Mean value
COUNT: Number of items
MIN/MAX: Smallest/largest value
STDEV: Standard deviation
Practical Example:
Data: Sales by Salesperson, Month, Product
Pivot Table Structure:
─────────────────────────────────────────
Jan Feb Mar Total
─────────────────────────────────────────
John $5,000 $6,000 $7,000 $18,000
Jane $4,500 $5,500 $6,200 $16,200
Bob $3,200 $3,800 $4,100 $11,100
─────────────────────────────────────────
Total $12,700 $15,300 $17,300 $45,300
WEEK 5-6: SQL FOR DATA ANALYSIS
Module 6: SQL Fundamentals
Lesson 6.1: Introduction to SQL
What is SQL?
Structured Query Language. Universal language for interacting with databases.
Why SQL?
Access large datasets efficiently
Extract specific data subsets
Perform complex calculations
Join data from multiple tables
Lesson 6.2: Basic SELECT Queries
Selecting All Data:
sql
SELECT*FROM customers;
Selecting Specific Columns:
sql
SELECT name, email, age FROM customers;
Filtering with WHERE:
sql
SELECT*FROM customers
WHERE age >25AND region ='North India';
Logical Operators:
AND: Both conditions true
OR: At least one condition true
NOT: Condition not true
IN: Value in list
BETWEEN: Value in range
LIKE: Pattern matching
Lesson 6.3: Aggregation and Grouping
Aggregate Functions:
sql
SELECTCOUNT(*)as total_customers,AVG(purchase_amount)as avg_purchase,SUM(purchase_amount)as total_revenue,MAX(purchase_amount)as max_purchase
FROM orders;
GROUP BY - Aggregate by Category:
sql
SELECT
region,COUNT(*)as customer_count,SUM(purchase_amount)as total_sales,AVG(purchase_amount)as avg_sale
FROM orders
GROUPBY region;
HAVING - Filter Groups:
sql
SELECT
product_category,COUNT(*)as sales_count
FROM orders
GROUPBY product_category
HAVINGCOUNT(*)>50;
Module 7: Advanced SQL
Lesson 7.1: Joins
INNER JOIN - Common Records
sql
SELECT
o.order_id,
c.customer_name,
o.purchase_amount
FROM orders o
INNERJOIN customers c
ON o.customer_id = c.customer_id;
LEFT JOIN - All from Left Table
sql
SELECT
c.customer_name,COUNT(o.order_id)as order_count
FROM customers c
LEFTJOIN orders o
ON c.customer_id = o.customer_id
GROUPBY c.customer_name;
Lesson 7.2: Subqueries
Subquery in WHERE Clause:
sql
SELECT*FROM customers
WHERE customer_id IN(SELECT customer_id FROM orders
WHERE purchase_amount >10000);
Subquery in FROM Clause:
sql
SELECT
region,
avg_purchase
FROM(SELECT
region,AVG(purchase_amount)as avg_purchase
FROM orders
GROUPBY region
)as regional_avg
WHERE avg_purchase >5000;
Lesson 7.3: Window Functions
Running Total:
sql
SELECTdate,
amount,SUM(amount)OVER(ORDERBYdate)as running_total
FROM sales
ORDERBYdate;
Rank Within Groups:
sql
SELECT
salesperson,
sales,
RANK()OVER(ORDERBY sales DESC)as rank
FROM sales_performance;
WEEK 7-8: PYTHON FOR DATA ANALYSIS
Module 8: Python Basics for Analytics
Lesson 8.1: Python Setup and Libraries
Essential Libraries:
python
# Data manipulationimport pandas as pd
import numpy as np
# Visualizationimport matplotlib.pyplot as plt
import seaborn as sns
# Statistical analysisfrom scipy import stats
import statsmodels.api as sm
# Machine learning (later weeks)from sklearn import preprocessing, ensemble, metrics
Lesson 8.2: Working with Data in Pandas
Creating DataFrames:
python
# From dictionary
data ={'Name':['Alice','Bob','Charlie'],'Age':[25,30,35],'Salary':[50000,60000,75000]}
df = pd.DataFrame(data)# From CSV
df = pd.read_csv('customers.csv')# From Excel
df = pd.read_excel('sales_data.xlsx')
Exploring Data:
python
# First/last rows
df.head()
df.tail()# Dataset info
df.info()# Data types, missing values
df.describe()# Statistical summary
df.shape # Rows and columns# Check for missing values
df.isnull().sum()
Data Cleaning:
python
# Remove duplicates
df = df.drop_duplicates()# Handle missing values
df = df.dropna()# Remove rows with NaN
df['Age']= df['Age'].fillna(df['Age'].mean())# Fill with mean# Remove outliers (values > 3 std devs)
df = df[np.abs(stats.zscore(df['Salary']))<3]# Data type conversion
df['Date']= pd.to_datetime(df['Date'])
df['Age']= df['Age'].astype(int)
Filtering and Selection:
python
# Filter rows
high_earners = df[df['Salary']>60000]# Multiple conditions
result = df[(df['Age']>25)&(df['Salary']<80000)]# Select columns
selected = df[['Name','Salary']]# Select by location
df.loc[0]# First row
df.iloc[0:5]# First 5 rows
df.loc[df['Age']>30,'Name']# Names of people over 30
# Group by single column
by_region = df.groupby('Region')['Salary'].mean()# Multiple aggregations
summary = df.groupby('Department').agg({'Salary':['mean','min','max'],'Age':'mean','Name':'count'})# Group by multiple columns
by_dept_region = df.groupby(['Department','Region'])['Salary'].sum()
Step 4: Multivariate Analysis
Analyze relationships between variables.
python
# Correlation between continuous variables
df[['Age','Salary','Experience']].corr()# Relationship between categorical and continuous
df.groupby('Region')['Salary'].mean()
sns.boxplot(x='Region', y='Salary', data=df)
Step 5: Anomalies and Outliers
Identify unusual patterns.
Problem: Predict employee salary based on experience years
python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np
# Prepare data
X = df[['Years_Experience']].values
y = df['Salary'].values
# Split into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)# Train model
model = LinearRegression()
model.fit(X_train, y_train)# Make predictions
y_pred = model.predict(X_test)# Evaluate
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))print(f"R² Score: {r2:.2f}")# 0 to 1; higher is betterprint(f"RMSE: ${rmse:.2f}")
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
# Prepare features and target
X = df[['Age','Tenure_Months','Monthly_Spend','Support_Tickets']]
y = df['Churned']# 0 = No, 1 = Yes# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2)# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)# Predict
y_pred = model.predict(X_test)# Evaluate
accuracy = accuracy_score(y_test, y_pred)print(f"Accuracy: {accuracy:.2%}")# Percentage correct# Confusion matrix
cm = confusion_matrix(y_test, y_pred)print(cm)# [[TN FP] True Negatives & False Positives# [FN TP]] False Negatives & True Positives
WEEK 12: CAPSTONE PROJECT & CAREER PATHWAYS
Module 14: Capstone Analytics Project
Project: Retail Sales Analysis and Prediction
Scenario:
You're hired as data analyst for online retail company. Your task: analyze sales patterns, identify high-value customers, forecast revenue, and recommend strategies.
Dataset Provided:
Columns: Date, Product, Region, Customer_Type,
Units_Sold, Unit_Price, Total_Sales, Customer_ID
Rows: 100,000+ transactions over 2 years
Part 1: Exploratory Analysis (Week 12, Days 1-2)
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = pd.read_csv('retail_sales.csv')
df['Date']= pd.to_datetime(df['Date'])# Basic explorationprint(f"Dataset shape: {df.shape}")print(f"\nMissing values:\n{df.isnull().sum()}")print(f"\nData types:\n{df.dtypes}")# Statistical summaryprint(df.describe())# Analysisprint("\nTop 5 Products by Revenue:")print(df.groupby('Product')['Total_Sales'].sum().sort_values(ascending=False).head())print("\nSales by Region:")print(df.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False))# Visualizations
fig, axes = plt.subplots(2,2, figsize=(15,10))# Sales trend over time
daily_sales = df.groupby('Date')['Total_Sales'].sum()
axes[0,0].plot(daily_sales)
axes[0,0].set_title('Daily Sales Trend')# Sales by region
df.groupby('Region')['Total_Sales'].sum().plot(kind='bar', ax=axes[0,1])
axes[0,1].set_title('Sales by Region')# Product performance
df.groupby('Product')['Units_Sold'].sum().plot(kind='barh', ax=axes[1,0])
axes[1,0].set_title('Units Sold by Product')# Customer type distribution
df.groupby('Customer_Type')['Total_Sales'].mean().plot(kind='bar', ax=axes[1,1])
axes[1,1].set_title('Average Sales by Customer Type')
plt.tight_layout()
plt.show()
Definition:
Data Analytics is the practice of examining raw data to discover patterns, draw conclusions, and support decision-making. It combines statistics, programming, and business acumen to transform data into actionable insights.
Course Structure
This comprehensive course is designed to take you from complete beginner to job-ready data analyst in 12 weeks through structured learning, hands-on projects, and real-world applications.
Prerequisite: Basic computer literacy. No prior analytics or programming experience required.
Learning Outcomes
By completing this course, you will be able to:
Understand data analytics fundamentals and concepts
Use Excel for data analysis and visualization
Write SQL queries to extract and transform data
Use Python for data manipulation and analysis
Create compelling data visualizations
Develop analytics dashboards
Conduct exploratory data analysis
Build predictive models
Present insights to stakeholders
Execute end-to-end analytics projects
WEEK 1-2: FOUNDATIONS OF DATA ANALYTICS
Module 1: Data Analytics Fundamentals
Lesson 1.1: Introduction to Data Analytics
Key Concepts:
Data
Raw facts and observations. Examples: Sales transactions, customer demographics, website clicks.
Information
Data processed into meaningful context. Example: "Sales increased 15% in Q3 among 25-35 age group in North India."
Insights
Conclusions drawn from information that drive decisions. Example: "We should target 25-35 age group with Q4 campaign in North India for maximum ROI."
The Analytics Process:
Raw Data → Collection → Cleaning → Analysis → Visualization →
Insights → Action → Business Impact
Lesson 1.2: Types of Analytics
Descriptive Analytics - "What Happened?"
Analyzing historical data to understand patterns.
Example: "Average monthly sales over past 12 months"
Tools: Excel, Power BI, Tableau
Outcome: Reports, dashboards, summaries
Diagnostic Analytics - "Why Did It Happen?"
Investigating reasons behind outcomes.
Example: "Why did sales increase 15% in Q3?"
Techniques: Correlation analysis, trend analysis, root cause analysis
Tools: SQL, Python, statistical software
Predictive Analytics - "What Will Happen?"
Using historical data to forecast future outcomes.
Example: "Predict next month's sales based on historical trends"
Techniques: Machine learning, regression, time series forecasting
Tools: Python, R, specialized ML platforms
Prescriptive Analytics - "What Should We Do?"
Recommending actions to achieve desired outcomes.
Example: "Allocate 40% budget to North India, 35% to South, 25% to East for maximum projected revenue"
=SUM(A1:A10) // Add values
=AVERAGE(A1:A10) // Calculate average
=COUNT(A1:A10) // Count cells with numbers
=MIN(A1:A10) // Find minimum
=MAX(A1:A10) // Find maximum
Logical Functions
=IF(A1>100, "High", "Low") // Conditional logic
=AND(A1>50, B1<100) // All conditions true?
=OR(A1=1, A1=2, A1=3) // Any condition true?
Text Functions
=LEN(A1) // Length of text
=UPPER(A1) // Convert to uppercase
=CONCATENATE(A1, " ", B1) // Combine text
=LEFT(A1, 3) // First 3 characters
Lesson 4.3: Data Cleaning
Common Data Quality Issues:
Missing Values
Identify: Look for blank cells
Handle: Delete, fill with mean/median, leave as is depending on context
Duplicate Records
Identify: Sort and visually inspect, or Data > Remove Duplicates
Handle: Remove duplicates or investigate root cause
Inconsistent Formatting
Standardize text cases (UPPER, LOWER, PROPER functions)
Consistent date formats
Consistent number formats
Outliers
Identify: Values significantly different from others
Verify: Confirm if genuine or data error
Handle: Keep if genuine, remove if error
Module 5: Pivot Tables and Summarization
Lesson 5.1: Creating Pivot Tables
What is a Pivot Table?
Summarizes large datasets into meaningful insights. Rotates (pivots) data dimensions.
Example Scenario:
Raw data: Sales transactions with Date, Region, Product, Amount
Pivot table shows: Revenue by Region and Product
Steps to Create:
Select data range
Insert > Pivot Table
Drag fields to Rows, Columns, Values
Customize as needed
Lesson 5.2: Aggregation Functions in Pivot Tables
SUM: Total of values
AVERAGE: Mean value
COUNT: Number of items
MIN/MAX: Smallest/largest value
STDEV: Standard deviation
Practical Example:
Data: Sales by Salesperson, Month, Product
Pivot Table Structure:
─────────────────────────────────────────
Jan Feb Mar Total
─────────────────────────────────────────
John $5,000 $6,000 $7,000 $18,000
Jane $4,500 $5,500 $6,200 $16,200
Bob $3,200 $3,800 $4,100 $11,100
─────────────────────────────────────────
Total $12,700 $15,300 $17,300 $45,300
WEEK 5-6: SQL FOR DATA ANALYSIS
Module 6: SQL Fundamentals
Lesson 6.1: Introduction to SQL
What is SQL?
Structured Query Language. Universal language for interacting with databases.
Why SQL?
Access large datasets efficiently
Extract specific data subsets
Perform complex calculations
Join data from multiple tables
Lesson 6.2: Basic SELECT Queries
Selecting All Data:
sql
SELECT*FROM customers;
Selecting Specific Columns:
sql
SELECT name, email, age FROM customers;
Filtering with WHERE:
sql
SELECT*FROM customers
WHERE age >25AND region ='North India';
Logical Operators:
AND: Both conditions true
OR: At least one condition true
NOT: Condition not true
IN: Value in list
BETWEEN: Value in range
LIKE: Pattern matching
Lesson 6.3: Aggregation and Grouping
Aggregate Functions:
sql
SELECTCOUNT(*)as total_customers,AVG(purchase_amount)as avg_purchase,SUM(purchase_amount)as total_revenue,MAX(purchase_amount)as max_purchase
FROM orders;
GROUP BY - Aggregate by Category:
sql
SELECT
region,COUNT(*)as customer_count,SUM(purchase_amount)as total_sales,AVG(purchase_amount)as avg_sale
FROM orders
GROUPBY region;
HAVING - Filter Groups:
sql
SELECT
product_category,COUNT(*)as sales_count
FROM orders
GROUPBY product_category
HAVINGCOUNT(*)>50;
Module 7: Advanced SQL
Lesson 7.1: Joins
INNER JOIN - Common Records
sql
SELECT
o.order_id,
c.customer_name,
o.purchase_amount
FROM orders o
INNERJOIN customers c
ON o.customer_id = c.customer_id;
LEFT JOIN - All from Left Table
sql
SELECT
c.customer_name,COUNT(o.order_id)as order_count
FROM customers c
LEFTJOIN orders o
ON c.customer_id = o.customer_id
GROUPBY c.customer_name;
Lesson 7.2: Subqueries
Subquery in WHERE Clause:
sql
SELECT*FROM customers
WHERE customer_id IN(SELECT customer_id FROM orders
WHERE purchase_amount >10000);
Subquery in FROM Clause:
sql
SELECT
region,
avg_purchase
FROM(SELECT
region,AVG(purchase_amount)as avg_purchase
FROM orders
GROUPBY region
)as regional_avg
WHERE avg_purchase >5000;
Lesson 7.3: Window Functions
Running Total:
sql
SELECTdate,
amount,SUM(amount)OVER(ORDERBYdate)as running_total
FROM sales
ORDERBYdate;
Rank Within Groups:
sql
SELECT
salesperson,
sales,
RANK()OVER(ORDERBY sales DESC)as rank
FROM sales_performance;
WEEK 7-8: PYTHON FOR DATA ANALYSIS
Module 8: Python Basics for Analytics
Lesson 8.1: Python Setup and Libraries
Essential Libraries:
python
# Data manipulationimport pandas as pd
import numpy as np
# Visualizationimport matplotlib.pyplot as plt
import seaborn as sns
# Statistical analysisfrom scipy import stats
import statsmodels.api as sm
# Machine learning (later weeks)from sklearn import preprocessing, ensemble, metrics
Lesson 8.2: Working with Data in Pandas
Creating DataFrames:
python
# From dictionary
data ={'Name':['Alice','Bob','Charlie'],'Age':[25,30,35],'Salary':[50000,60000,75000]}
df = pd.DataFrame(data)# From CSV
df = pd.read_csv('customers.csv')# From Excel
df = pd.read_excel('sales_data.xlsx')
Exploring Data:
python
# First/last rows
df.head()
df.tail()# Dataset info
df.info()# Data types, missing values
df.describe()# Statistical summary
df.shape # Rows and columns# Check for missing values
df.isnull().sum()
Data Cleaning:
python
# Remove duplicates
df = df.drop_duplicates()# Handle missing values
df = df.dropna()# Remove rows with NaN
df['Age']= df['Age'].fillna(df['Age'].mean())# Fill with mean# Remove outliers (values > 3 std devs)
df = df[np.abs(stats.zscore(df['Salary']))<3]# Data type conversion
df['Date']= pd.to_datetime(df['Date'])
df['Age']= df['Age'].astype(int)
Filtering and Selection:
python
# Filter rows
high_earners = df[df['Salary']>60000]# Multiple conditions
result = df[(df['Age']>25)&(df['Salary']<80000)]# Select columns
selected = df[['Name','Salary']]# Select by location
df.loc[0]# First row
df.iloc[0:5]# First 5 rows
df.loc[df['Age']>30,'Name']# Names of people over 30
# Group by single column
by_region = df.groupby('Region')['Salary'].mean()# Multiple aggregations
summary = df.groupby('Department').agg({'Salary':['mean','min','max'],'Age':'mean','Name':'count'})# Group by multiple columns
by_dept_region = df.groupby(['Department','Region'])['Salary'].sum()
Step 4: Multivariate Analysis
Analyze relationships between variables.
python
# Correlation between continuous variables
df[['Age','Salary','Experience']].corr()# Relationship between categorical and continuous
df.groupby('Region')['Salary'].mean()
sns.boxplot(x='Region', y='Salary', data=df)
Step 5: Anomalies and Outliers
Identify unusual patterns.
Problem: Predict employee salary based on experience years
python
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np
# Prepare data
X = df[['Years_Experience']].values
y = df['Salary'].values
# Split into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)# Train model
model = LinearRegression()
model.fit(X_train, y_train)# Make predictions
y_pred = model.predict(X_test)# Evaluate
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))print(f"R² Score: {r2:.2f}")# 0 to 1; higher is betterprint(f"RMSE: ${rmse:.2f}")
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
# Prepare features and target
X = df[['Age','Tenure_Months','Monthly_Spend','Support_Tickets']]
y = df['Churned']# 0 = No, 1 = Yes# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2)# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)# Predict
y_pred = model.predict(X_test)# Evaluate
accuracy = accuracy_score(y_test, y_pred)print(f"Accuracy: {accuracy:.2%}")# Percentage correct# Confusion matrix
cm = confusion_matrix(y_test, y_pred)print(cm)# [[TN FP] True Negatives & False Positives# [FN TP]] False Negatives & True Positives
WEEK 12: CAPSTONE PROJECT & CAREER PATHWAYS
Module 14: Capstone Analytics Project
Project: Retail Sales Analysis and Prediction
Scenario:
You're hired as data analyst for online retail company. Your task: analyze sales patterns, identify high-value customers, forecast revenue, and recommend strategies.
Dataset Provided:
Columns: Date, Product, Region, Customer_Type,
Units_Sold, Unit_Price, Total_Sales, Customer_ID
Rows: 100,000+ transactions over 2 years
Part 1: Exploratory Analysis (Week 12, Days 1-2)
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = pd.read_csv('retail_sales.csv')
df['Date']= pd.to_datetime(df['Date'])# Basic explorationprint(f"Dataset shape: {df.shape}")print(f"\nMissing values:\n{df.isnull().sum()}")print(f"\nData types:\n{df.dtypes}")# Statistical summaryprint(df.describe())# Analysisprint("\nTop 5 Products by Revenue:")print(df.groupby('Product')['Total_Sales'].sum().sort_values(ascending=False).head())print("\nSales by Region:")print(df.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False))# Visualizations
fig, axes = plt.subplots(2,2, figsize=(15,10))# Sales trend over time
daily_sales = df.groupby('Date')['Total_Sales'].sum()
axes[0,0].plot(daily_sales)
axes[0,0].set_title('Daily Sales Trend')# Sales by region
df.groupby('Region')['Total_Sales'].sum().plot(kind='bar', ax=axes[0,1])
axes[0,1].set_title('Sales by Region')# Product performance
df.groupby('Product')['Units_Sold'].sum().plot(kind='barh', ax=axes[1,0])
axes[1,0].set_title('Units Sold by Product')# Customer type distribution
df.groupby('Customer_Type')['Total_Sales'].mean().plot(kind='bar', ax=axes[1,1])
axes[1,1].set_title('Average Sales by Customer Type')
plt.tight_layout()
plt.show()
Q: What will I learn in this data analytics course?
A: You will learn data analytics fundamentals, Excel, SQL, Python, data visualization, dashboards, predictive analytics, and real-world project workflows.
FAQ 2
Q: Is this data analytics course suitable for beginners?
A: Yes. This course script is designed for beginners and gradually advances to professional-level data analytics skills.
FAQ 3
Q: What tools are covered in this data analytics course?
A: The course covers Excel, SQL, Python, data visualization tools, dashboards, and analytics techniques used in real business scenarios.
FAQ 4
Q: Can this course help me start a career in data analytics?
A: Yes. The course focuses on practical skills, structured learning, and projects that align with entry-level and mid-level data analytics roles.