DIGITAL MARKETING
Data Analytics: Complete Course Script - From Beginner to Professional
- Get link
- X
- Other Apps
Comprehensive data analytics course covering fundamentals, tools, techniques, real-world projects, and career pathways. Learn skills to become a professional data analyst in 12 weeks.
Course Overview and Learning Objectives
What is Data Analytics?
Definition: Data Analytics is the practice of examining raw data to discover patterns, draw conclusions, and support decision-making. It combines statistics, programming, and business acumen to transform data into actionable insights.
Course Structure
This comprehensive course is designed to take you from complete beginner to job-ready data analyst in 12 weeks through structured learning, hands-on projects, and real-world applications.
Prerequisite: Basic computer literacy. No prior analytics or programming experience required.
Learning Outcomes
By completing this course, you will be able to:
- Understand data analytics fundamentals and concepts
- Use Excel for data analysis and visualization
- Write SQL queries to extract and transform data
- Use Python for data manipulation and analysis
- Create compelling data visualizations
- Develop analytics dashboards
- Conduct exploratory data analysis
- Build predictive models
- Present insights to stakeholders
- Execute end-to-end analytics projects
WEEK 1-2: FOUNDATIONS OF DATA ANALYTICS
Module 1: Data Analytics Fundamentals
Lesson 1.1: Introduction to Data Analytics
Data Raw facts and observations. Examples: Sales transactions, customer demographics, website clicks.
Information Data processed into meaningful context. Example: "Sales increased 15% in Q3 among 25-35 age group in North India."
Insights Conclusions drawn from information that drive decisions. Example: "We should target 25-35 age group with Q4 campaign in North India for maximum ROI."
The Analytics Process:
Raw Data → Collection → Cleaning → Analysis → Visualization →
Insights → Action → Business ImpactLesson 1.2: Types of Analytics
Descriptive Analytics - "What Happened?" Analyzing historical data to understand patterns.
Example: "Average monthly sales over past 12 months"
Tools: Excel, Power BI, Tableau
Outcome: Reports, dashboards, summaries
Diagnostic Analytics - "Why Did It Happen?" Investigating reasons behind outcomes.
Example: "Why did sales increase 15% in Q3?"
Techniques: Correlation analysis, trend analysis, root cause analysis
Tools: SQL, Python, statistical software
Predictive Analytics - "What Will Happen?" Using historical data to forecast future outcomes.
Example: "Predict next month's sales based on historical trends"
Techniques: Machine learning, regression, time series forecasting
Tools: Python, R, specialized ML platforms
Prescriptive Analytics - "What Should We Do?" Recommending actions to achieve desired outcomes.
Example: "Allocate 40% budget to North India, 35% to South, 25% to East for maximum projected revenue"
Techniques: Optimization, simulation, decision analysis
Module 2: Data Types and Structures
Lesson 2.1: Data Types
Structured Data Organized in tables with rows and columns. Example: Customer database with columns (Name, Age, Email, Purchase Amount).
Unstructured Data No predefined structure. Examples: Text documents, images, videos, audio.
Semi-Structured Data Mix of structured and unstructured. Examples: JSON, XML, log files.
Quantitative (Numerical) Data Measurable quantities. Examples: Age, salary, temperature, revenue.
Categorical (Qualitative) Data Non-numeric categories. Examples: Color, gender, product category, region.
Lesson 2.2: Data Distribution
Normal Distribution Bell-shaped curve; most data points near mean. Many natural phenomena follow normal distribution.
Skewed Distribution Asymmetrical. Positive skew (tail right), negative skew (tail left).
Bimodal Distribution Two peaks. Suggests two distinct groups in data.
Understanding distributions helps identify:
- Outliers and anomalies
- Appropriate statistical tests
- Data transformation needs
Module 3: Key Analytics Concepts
Lesson 3.1: Statistical Foundations
Mean (Average) Sum of values divided by count.
Mean = (Sum of all values) / (Number of values)
Example: (10 + 20 + 30 + 40 + 50) / 5 = 30Median Middle value when data sorted. Less affected by outliers.
Example: 10, 20, 30, 40, 50
Median = 30 (middle value)Mode Most frequently occurring value.
Example: 10, 20, 20, 30, 30, 30
Mode = 30 (occurs 3 times)Standard Deviation Measures spread/variability of data. High SD = data spread out; Low SD = data clustered.
Variance Square of standard deviation. Measures how far data points are from mean.
Correlation Relationship between two variables. Ranges from -1 to +1.
Correlation = 1: Perfect positive relationship
Correlation = 0: No relationship
Correlation = -1: Perfect negative relationshipLesson 3.2: Sampling and Bias
Sampling Selecting subset of data for analysis. Used when full dataset is too large.
Sample Size Larger samples more accurately represent population. Balance between precision and resources.
Bias Systematic error skewing results. Types: Selection bias, measurement bias, response bias.
Mitigation: Random sampling, stratified sampling, careful data collection.
WEEK 3-4: EXCEL FOR DATA ANALYSIS
Module 4: Excel Fundamentals and Data Preparation
Lesson 4.1: Excel Basics
Spreadsheet Structure Rows (horizontal), columns (vertical), cells (intersections).
Data Entry Best Practices:
- One data point per cell
- Consistent formatting
- No merged cells in data ranges
- Column headers in first row
- No blank rows/columns within data
Lesson 4.2: Formulas and Functions
Mathematical Functions
=SUM(A1:A10) // Add values
=AVERAGE(A1:A10) // Calculate average
=COUNT(A1:A10) // Count cells with numbers
=MIN(A1:A10) // Find minimum
=MAX(A1:A10) // Find maximumLogical Functions
=IF(A1>100, "High", "Low") // Conditional logic
=AND(A1>50, B1<100) // All conditions true?
=OR(A1=1, A1=2, A1=3) // Any condition true?Text Functions
=LEN(A1) // Length of text
=UPPER(A1) // Convert to uppercase
=CONCATENATE(A1, " ", B1) // Combine text
=LEFT(A1, 3) // First 3 charactersLesson 4.3: Data Cleaning
Common Data Quality Issues:
Missing Values
- Identify: Look for blank cells
- Handle: Delete, fill with mean/median, leave as is depending on context
Duplicate Records
- Identify: Sort and visually inspect, or Data > Remove Duplicates
- Handle: Remove duplicates or investigate root cause
Inconsistent Formatting
- Standardize text cases (UPPER, LOWER, PROPER functions)
- Consistent date formats
- Consistent number formats
Outliers
- Identify: Values significantly different from others
- Verify: Confirm if genuine or data error
- Handle: Keep if genuine, remove if error
Module 5: Pivot Tables and Summarization
Lesson 5.1: Creating Pivot Tables
What is a Pivot Table? Summarizes large datasets into meaningful insights. Rotates (pivots) data dimensions.
Example Scenario: Raw data: Sales transactions with Date, Region, Product, Amount
Pivot table shows: Revenue by Region and Product
Steps to Create:
- Select data range
- Insert > Pivot Table
- Drag fields to Rows, Columns, Values
- Customize as needed
Lesson 5.2: Aggregation Functions in Pivot Tables
SUM: Total of values
AVERAGE: Mean value
COUNT: Number of items
MIN/MAX: Smallest/largest value
STDEV: Standard deviationPractical Example:
Data: Sales by Salesperson, Month, Product
Pivot Table Structure:
─────────────────────────────────────────
Jan Feb Mar Total
─────────────────────────────────────────
John $5,000 $6,000 $7,000 $18,000
Jane $4,500 $5,500 $6,200 $16,200
Bob $3,200 $3,800 $4,100 $11,100
─────────────────────────────────────────
Total $12,700 $15,300 $17,300 $45,300WEEK 5-6: SQL FOR DATA ANALYSIS
Module 6: SQL Fundamentals
Lesson 6.1: Introduction to SQL
What is SQL? Structured Query Language. Universal language for interacting with databases.
Why SQL?
- Access large datasets efficiently
- Extract specific data subsets
- Perform complex calculations
- Join data from multiple tables
Lesson 6.2: Basic SELECT Queries
Selecting All Data:
SELECT * FROM customers;Selecting Specific Columns:
SELECT name, email, age FROM customers;Filtering with WHERE:
SELECT * FROM customers
WHERE age > 25 AND region = 'North India';Logical Operators:
AND: Both conditions true
OR: At least one condition true
NOT: Condition not true
IN: Value in list
BETWEEN: Value in range
LIKE: Pattern matchingLesson 6.3: Aggregation and Grouping
Aggregate Functions:
SELECT
COUNT(*) as total_customers,
AVG(purchase_amount) as avg_purchase,
SUM(purchase_amount) as total_revenue,
MAX(purchase_amount) as max_purchase
FROM orders;GROUP BY - Aggregate by Category:
SELECT
region,
COUNT(*) as customer_count,
SUM(purchase_amount) as total_sales,
AVG(purchase_amount) as avg_sale
FROM orders
GROUP BY region;HAVING - Filter Groups:
SELECT
product_category,
COUNT(*) as sales_count
FROM orders
GROUP BY product_category
HAVING COUNT(*) > 50;Module 7: Advanced SQL
Lesson 7.1: Joins
INNER JOIN - Common Records
SELECT
o.order_id,
c.customer_name,
o.purchase_amount
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id;LEFT JOIN - All from Left Table
SELECT
c.customer_name,
COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY c.customer_name;Lesson 7.2: Subqueries
Subquery in WHERE Clause:
SELECT * FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders
WHERE purchase_amount > 10000
);Subquery in FROM Clause:
SELECT
region,
avg_purchase
FROM (
SELECT
region,
AVG(purchase_amount) as avg_purchase
FROM orders
GROUP BY region
) as regional_avg
WHERE avg_purchase > 5000;Lesson 7.3: Window Functions
Running Total:
SELECT
date,
amount,
SUM(amount) OVER (
ORDER BY date
) as running_total
FROM sales
ORDER BY date;Rank Within Groups:
SELECT
salesperson,
sales,
RANK() OVER (ORDER BY sales DESC) as rank
FROM sales_performance;WEEK 7-8: PYTHON FOR DATA ANALYSIS
Module 8: Python Basics for Analytics
Lesson 8.1: Python Setup and Libraries
Essential Libraries:
# Data manipulation
import pandas as pd
import numpy as np
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Statistical analysis
from scipy import stats
import statsmodels.api as sm
# Machine learning (later weeks)
from sklearn import preprocessing, ensemble, metricsLesson 8.2: Working with Data in Pandas
Creating DataFrames:
# From dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 75000]
}
df = pd.DataFrame(data)
# From CSV
df = pd.read_csv('customers.csv')
# From Excel
df = pd.read_excel('sales_data.xlsx')Exploring Data:
# First/last rows
df.head()
df.tail()
# Dataset info
df.info() # Data types, missing values
df.describe() # Statistical summary
df.shape # Rows and columns
# Check for missing values
df.isnull().sum()Data Cleaning:
# Remove duplicates
df = df.drop_duplicates()
# Handle missing values
df = df.dropna() # Remove rows with NaN
df['Age'] = df['Age'].fillna(df['Age'].mean()) # Fill with mean
# Remove outliers (values > 3 std devs)
df = df[np.abs(stats.zscore(df['Salary'])) < 3]
# Data type conversion
df['Date'] = pd.to_datetime(df['Date'])
df['Age'] = df['Age'].astype(int)Filtering and Selection:
# Filter rows
high_earners = df[df['Salary'] > 60000]
# Multiple conditions
result = df[(df['Age'] > 25) & (df['Salary'] < 80000)]
# Select columns
selected = df[['Name', 'Salary']]
# Select by location
df.loc[0] # First row
df.iloc[0:5] # First 5 rows
df.loc[df['Age'] > 30, 'Name'] # Names of people over 30Module 9: Data Analysis with Python
Lesson 9.1: Descriptive Statistics
# Mean, median, mode
df['Salary'].mean()
df['Salary'].median()
df['Salary'].mode()
# Spread
df['Salary'].std() # Standard deviation
df['Salary'].var() # Variance
df['Salary'].min()
df['Salary'].max()
# Percentiles
df['Salary'].quantile(0.25) # 25th percentile
df['Salary'].quantile(0.75) # 75th percentileLesson 9.2: Grouping and Aggregation
# Group by single column
by_region = df.groupby('Region')['Salary'].mean()
# Multiple aggregations
summary = df.groupby('Department').agg({
'Salary': ['mean', 'min', 'max'],
'Age': 'mean',
'Name': 'count'
})
# Group by multiple columns
by_dept_region = df.groupby(['Department', 'Region'])['Salary'].sum()Lesson 9.3: Merging and Joining Data
# Merge DataFrames
df_merged = pd.merge(df_customers, df_orders,
on='customer_id',
how='inner')
# Concatenate
df_combined = pd.concat([df1, df2], axis=0)
# Join
df_result = df1.join(df2, on='key')Lesson 9.4: Correlation and Statistical Testing
# Correlation matrix
correlation = df.corr()
# Correlation with specific column
df.corr()['Salary'].sort_values(ascending=False)
# Statistical test - t-test (comparing two groups)
from scipy import stats
group1 = df[df['Region'] == 'North']['Salary']
group2 = df[df['Region'] == 'South']['Salary']
t_stat, p_value = stats.ttest_ind(group1, group2)
print(f"T-statistic: {t_stat}, P-value: {p_value}")
if p_value < 0.05:
print("Significant difference between regions")WEEK 8-9: DATA VISUALIZATION
Module 10: Visualization Fundamentals
Lesson 10.1: Visualization Principles
Choose Right Chart Type:
Line Chart - Show trends over time
Sales Over 12 Months (ascending trend)Bar Chart - Compare categories
Sales by Region (North > South > East)Pie Chart - Show parts of whole
Market Share: Company A 40%, B 35%, C 25%Scatter Plot - Show relationship between two variables
Age vs Salary (positive correlation)Histogram - Show distribution
Customer age distribution (bell curve)Lesson 10.2: Visualization with Python
Matplotlib - Basic Plotting:
import matplotlib.pyplot as plt
# Line chart
plt.figure(figsize=(10, 6))
plt.plot(df['Month'], df['Sales'], marker='o')
plt.xlabel('Month')
plt.ylabel('Sales ($)')
plt.title('Monthly Sales Trend')
plt.grid(True)
plt.show()
# Bar chart
plt.bar(df['Region'], df['Total_Sales'])
plt.xlabel('Region')
plt.ylabel('Sales ($)')
plt.title('Sales by Region')
plt.show()
# Histogram
plt.hist(df['Age'], bins=20)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Age Distribution')
plt.show()Seaborn - Advanced Visualization:
import seaborn as sns
# Heatmap showing correlations
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
# Box plot by category
sns.boxplot(x='Region', y='Salary', data=df)
plt.title('Salary Distribution by Region')
plt.show()
# Scatter plot with colors
sns.scatterplot(x='Age', y='Salary', hue='Region', data=df)
plt.title('Age vs Salary by Region')
plt.show()Module 11: Business Intelligence Dashboards
Lesson 11.1: Dashboard Design Principles
KPI (Key Performance Indicator) Selection: Choose 3-5 most important metrics for target audience.
Visual Hierarchy: Most important metrics prominent; supporting details secondary.
Color Usage:
- Green = Good
- Yellow = Warning
- Red = Alert
- Avoid rainbow colors (accessibility)
Interactivity: Filters, drill-downs, date ranges for exploration.
Lesson 11.2: Building Dashboards with Power BI
Dashboard Components:
┌─────────────────────────────────────┐
│ Dashboard: Monthly Performance │
├─────────────────────────────────────┤
│ Revenue: $2.5M Growth: +15% │
│ Customers: 15K Churn: 2.3% │
├────────────────────┬────────────────┤
│ Revenue Trend │ Sales by Region│
│ (Line Chart) │ (Pie Chart) │
├────────────────────┼────────────────┤
│ Top Products │ Forecast │
│ (Bar Chart) │ (Area Chart) │
└────────────────────┴────────────────┘WEEK 10: EXPLORATORY DATA ANALYSIS (EDA)
Module 12: Conducting Comprehensive EDA
Lesson 12.1: EDA Framework
Step 1: Understand Data Context
- What does data represent?
- How was data collected?
- What are obvious limitations?
Step 2: Explore Structure
- Rows and columns count
- Data types
- Missing values percentage
Step 3: Univariate Analysis Analyze individual variables.
# Numerical variables
df['Age'].describe()
plt.hist(df['Age'], bins=30)
# Categorical variables
df['Region'].value_counts()
df['Region'].value_counts().plot(kind='bar')Step 4: Multivariate Analysis Analyze relationships between variables.
# Correlation between continuous variables
df[['Age', 'Salary', 'Experience']].corr()
# Relationship between categorical and continuous
df.groupby('Region')['Salary'].mean()
sns.boxplot(x='Region', y='Salary', data=df)Step 5: Anomalies and Outliers Identify unusual patterns.
# Identify outliers
Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df['Salary'] < Q1 - 1.5*IQR) |
(df['Salary'] > Q3 + 1.5*IQR)]
# Flag suspicious patterns
df[df['Age'] > 100] # Unrealistic ages
df[df['Salary'] < 0] # Negative salariesWEEK 11: PREDICTIVE ANALYTICS
Module 13: Introduction to Predictive Modeling
Lesson 13.1: Regression Analysis
Linear Regression - Predicting Continuous Values
Problem: Predict employee salary based on experience years
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np
# Prepare data
X = df[['Years_Experience']].values
y = df['Salary'].values
# Split into training (80%) and testing (20%)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"R² Score: {r2:.2f}") # 0 to 1; higher is better
print(f"RMSE: ${rmse:.2f}")Interpretation:
- R² = 0.85: Model explains 85% of salary variation
- RMSE = $5,000: Average prediction error is $5,000
Lesson 13.2: Classification - Predicting Categories
Problem: Predict if customer will churn (leave)
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
# Prepare features and target
X = df[['Age', 'Tenure_Months', 'Monthly_Spend', 'Support_Tickets']]
y = df['Churned'] # 0 = No, 1 = Yes
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2
)
# Train model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2%}") # Percentage correct
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
# [[TN FP] True Negatives & False Positives
# [FN TP]] False Negatives & True PositivesWEEK 12: CAPSTONE PROJECT & CAREER PATHWAYS
Module 14: Capstone Analytics Project
Project: Retail Sales Analysis and Prediction
Scenario: You're hired as data analyst for online retail company. Your task: analyze sales patterns, identify high-value customers, forecast revenue, and recommend strategies.
Dataset Provided:
Columns: Date, Product, Region, Customer_Type,
Units_Sold, Unit_Price, Total_Sales, Customer_ID
Rows: 100,000+ transactions over 2 yearsPart 1: Exploratory Analysis (Week 12, Days 1-2)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = pd.read_csv('retail_sales.csv')
df['Date'] = pd.to_datetime(df['Date'])
# Basic exploration
print(f"Dataset shape: {df.shape}")
print(f"\nMissing values:\n{df.isnull().sum()}")
print(f"\nData types:\n{df.dtypes}")
# Statistical summary
print(df.describe())
# Analysis
print("\nTop 5 Products by Revenue:")
print(df.groupby('Product')['Total_Sales'].sum().sort_values(ascending=False).head())
print("\nSales by Region:")
print(df.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False))
# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
# Sales trend over time
daily_sales = df.groupby('Date')['Total_Sales'].sum()
axes[0, 0].plot(daily_sales)
axes[0, 0].set_title('Daily Sales Trend')
# Sales by region
df.groupby('Region')['Total_Sales'].sum().plot(kind='bar', ax=axes[0, 1])
axes[0, 1].set_title('Sales by Region')
# Product performance
df.groupby('Product')['Units_Sold'].sum().plot(kind='barh', ax=axes[1, 0])
axes[1, 0].set_title('Units Sold by Product')
# Customer type distribution
df.groupby('Customer_Type')['Total_Sales'].mean().plot(kind='bar', ax=axes[1, 1])
axes[1, 1].set_title('Average Sales by Customer Type')
plt.tight_layout()
plt.show()Part 2: Customer Segmentation (Days 3-4)
# RFM Analysis: Recency, Frequency, Monetary
from datetime import datetime
reference_date = df['Date'].max() + timedelta(days=1)
rfm = df.groupby('Customer_ID').agg({
'Date': lambda x: (reference_date - x.max()).days, # Recency
'Customer_ID': 'count', # Frequency
'Total_Sales': 'sum' # Monetary
}).rename(columns={
'Date': 'Recency',
'Customer_ID': 'Frequency',
'Total_Sales': 'Monetary'
})
# Score customers (1=worst, 5=best)
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])
rfm['RFM_Score'] = rfm['R_Score'].astype(int) + rfm['F_Score'].astype(int) + rfm['M_Score'].astype(int)
# Segment customers
def segment(score):
if score >= 13: return 'Champions'
elif score >= 10: return 'Loyal'
elif score >= 7: return 'At Risk'
else: return 'Lost'
rfm['Segment'] = rfm['RFM_Score'].apply(segment)
print("\nCustomer Segments:")
print(rfm['Segment'].value_counts())Part 3: Revenue Forecasting (Days 5-6)
from sklearn.linear_model import LinearRegression
import numpy as np
# Prepare time series data
daily_revenue = df.groupby('Date')['Total_Sales'].sum().reset_index()
daily_revenue['Days_Since_Start'] = (daily_revenue['Date'] - daily_revenue['Date'].min()).dt.days
# Split into train/test
train_size = int(len(daily_revenue) * 0.8)
train = daily_revenue[:train_size]
test = daily_revenue[train_size:]
# Train model
X_train = train['Days_Since_Start'].values.reshape(-1, 1)
y_train = train['Total_Sales'].values
model = LinearRegression()
model.fit(X_train, y_train)
# Forecast next 30 days
future_days = np.arange(daily_revenue['Days_Since_Start'].max() + 1,
daily_revenue['Days_Since_Start'].max() + 31).reshape(-1, 1)
forecast = model.predict(future_days)
print(f"30-Day Revenue Forecast: ${forecast.sum():,.2f}")
print(f"Average Daily Forecast: ${forecast.mean():,.2f}")
# Visualize
plt.figure(figsize=(12, 6))
plt.plot(daily_revenue['Date'], daily_revenue['Total_Sales'], label='Actual')
future_dates = [daily_revenue['Date'].max() + timedelta(days=i) for i in range(1, 31)]
plt.plot(future_dates, forecast, label='Forecast', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Daily Revenue ($)')
plt.title('Revenue Forecast')
plt.legend()
plt.show()Part 4: Insights and Recommendations (Day 7)
Present findings:
- Top Performing Products: Product A and C drive 60% of revenue
- High-Value Customers: Top 20% of customers generate 80% of revenue
- Growth Opportunity: Region South underperforming; has potential
- Forecast: Expected 8% revenue growth next month
- Recommendations:
- Expand Product A in Region South
- Launch loyalty program for Champions segment
- Implement retention campaign for At Risk customers
- Optimize pricing in high-margin products
CAREER PATHWAYS AND ADVANCED TOPICS
Module 15: Career Development
Lesson 15.1: Data Analyst Career Paths
Entry-Level Analyst
- 0-2 years experience
- Focus: Data collection, cleaning, basic reporting
- Salary: ₹3-6 lakhs annually
- Skills: Excel, SQL, basic Python/R
Mid-Level Analyst
- 2-5 years experience
- Focus: Advanced analysis, dashboards, project leadership
- Salary: ₹6-12 lakhs
- Skills: SQL, Python, Power
Course Overview and Learning Objectives
What is Data Analytics?
Definition: Data Analytics is the practice of examining raw data to discover patterns, draw conclusions, and support decision-making. It combines statistics, programming, and business acumen to transform data into actionable insights.
Course Structure
This comprehensive course is designed to take you from complete beginner to job-ready data analyst in 12 weeks through structured learning, hands-on projects, and real-world applications.
Prerequisite: Basic computer literacy. No prior analytics or programming experience required.
Learning Outcomes
By completing this course, you will be able to:
- Understand data analytics fundamentals and concepts
- Use Excel for data analysis and visualization
- Write SQL queries to extract and transform data
- Use Python for data manipulation and analysis
- Create compelling data visualizations
- Develop analytics dashboards
- Conduct exploratory data analysis
- Build predictive models
- Present insights to stakeholders
- Execute end-to-end analytics projects
WEEK 1-2: FOUNDATIONS OF DATA ANALYTICS
Module 1: Data Analytics Fundamentals
Lesson 1.1: Introduction to Data Analytics
Key Concepts:
Data Raw facts and observations. Examples: Sales transactions, customer demographics, website clicks.
Information Data processed into meaningful context. Example: "Sales increased 15% in Q3 among 25-35 age group in North India."
Insights Conclusions drawn from information that drive decisions. Example: "We should target 25-35 age group with Q4 campaign in North India for maximum ROI."
The Analytics Process:
Raw Data → Collection → Cleaning → Analysis → Visualization → Insights → Action → Business ImpactLesson 1.2: Types of Analytics
Descriptive Analytics - "What Happened?" Analyzing historical data to understand patterns.
Example: "Average monthly sales over past 12 months"
Tools: Excel, Power BI, Tableau
Outcome: Reports, dashboards, summaries
Diagnostic Analytics - "Why Did It Happen?" Investigating reasons behind outcomes.
Example: "Why did sales increase 15% in Q3?"
Techniques: Correlation analysis, trend analysis, root cause analysis
Tools: SQL, Python, statistical software
Predictive Analytics - "What Will Happen?" Using historical data to forecast future outcomes.
Example: "Predict next month's sales based on historical trends"
Techniques: Machine learning, regression, time series forecasting
Tools: Python, R, specialized ML platforms
Prescriptive Analytics - "What Should We Do?" Recommending actions to achieve desired outcomes.
Example: "Allocate 40% budget to North India, 35% to South, 25% to East for maximum projected revenue"
Techniques: Optimization, simulation, decision analysis
Module 2: Data Types and Structures
Lesson 2.1: Data Types
Structured Data Organized in tables with rows and columns. Example: Customer database with columns (Name, Age, Email, Purchase Amount).
Unstructured Data No predefined structure. Examples: Text documents, images, videos, audio.
Semi-Structured Data Mix of structured and unstructured. Examples: JSON, XML, log files.
Quantitative (Numerical) Data Measurable quantities. Examples: Age, salary, temperature, revenue.
Categorical (Qualitative) Data Non-numeric categories. Examples: Color, gender, product category, region.
Lesson 2.2: Data Distribution
Normal Distribution Bell-shaped curve; most data points near mean. Many natural phenomena follow normal distribution.
Skewed Distribution Asymmetrical. Positive skew (tail right), negative skew (tail left).
Bimodal Distribution Two peaks. Suggests two distinct groups in data.
Understanding distributions helps identify:
- Outliers and anomalies
- Appropriate statistical tests
- Data transformation needs
Module 3: Key Analytics Concepts
Lesson 3.1: Statistical Foundations
Mean (Average) Sum of values divided by count.
Mean = (Sum of all values) / (Number of values) Example: (10 + 20 + 30 + 40 + 50) / 5 = 30Median Middle value when data sorted. Less affected by outliers.
Example: 10, 20, 30, 40, 50 Median = 30 (middle value)Mode Most frequently occurring value.
Example: 10, 20, 20, 30, 30, 30 Mode = 30 (occurs 3 times)Standard Deviation Measures spread/variability of data. High SD = data spread out; Low SD = data clustered.
Variance Square of standard deviation. Measures how far data points are from mean.
Correlation Relationship between two variables. Ranges from -1 to +1.
Correlation = 1: Perfect positive relationship Correlation = 0: No relationship Correlation = -1: Perfect negative relationshipLesson 3.2: Sampling and Bias
Sampling Selecting subset of data for analysis. Used when full dataset is too large.
Sample Size Larger samples more accurately represent population. Balance between precision and resources.
Bias Systematic error skewing results. Types: Selection bias, measurement bias, response bias.
Mitigation: Random sampling, stratified sampling, careful data collection.
WEEK 3-4: EXCEL FOR DATA ANALYSIS
Module 4: Excel Fundamentals and Data Preparation
Lesson 4.1: Excel Basics
Spreadsheet Structure Rows (horizontal), columns (vertical), cells (intersections).
Data Entry Best Practices:
- One data point per cell
- Consistent formatting
- No merged cells in data ranges
- Column headers in first row
- No blank rows/columns within data
Lesson 4.2: Formulas and Functions
Mathematical Functions
=SUM(A1:A10) // Add values =AVERAGE(A1:A10) // Calculate average =COUNT(A1:A10) // Count cells with numbers =MIN(A1:A10) // Find minimum =MAX(A1:A10) // Find maximumLogical Functions
=IF(A1>100, "High", "Low") // Conditional logic =AND(A1>50, B1<100) // All conditions true? =OR(A1=1, A1=2, A1=3) // Any condition true?Text Functions
=LEN(A1) // Length of text =UPPER(A1) // Convert to uppercase =CONCATENATE(A1, " ", B1) // Combine text =LEFT(A1, 3) // First 3 charactersLesson 4.3: Data Cleaning
Common Data Quality Issues:
Missing Values
- Identify: Look for blank cells
- Handle: Delete, fill with mean/median, leave as is depending on context
Duplicate Records
- Identify: Sort and visually inspect, or Data > Remove Duplicates
- Handle: Remove duplicates or investigate root cause
Inconsistent Formatting
- Standardize text cases (UPPER, LOWER, PROPER functions)
- Consistent date formats
- Consistent number formats
Outliers
- Identify: Values significantly different from others
- Verify: Confirm if genuine or data error
- Handle: Keep if genuine, remove if error
Module 5: Pivot Tables and Summarization
Lesson 5.1: Creating Pivot Tables
What is a Pivot Table? Summarizes large datasets into meaningful insights. Rotates (pivots) data dimensions.
Example Scenario: Raw data: Sales transactions with Date, Region, Product, Amount
Pivot table shows: Revenue by Region and Product
Steps to Create:
- Select data range
- Insert > Pivot Table
- Drag fields to Rows, Columns, Values
- Customize as needed
Lesson 5.2: Aggregation Functions in Pivot Tables
SUM: Total of values AVERAGE: Mean value COUNT: Number of items MIN/MAX: Smallest/largest value STDEV: Standard deviationPractical Example:
Data: Sales by Salesperson, Month, Product Pivot Table Structure: ───────────────────────────────────────── Jan Feb Mar Total ───────────────────────────────────────── John $5,000 $6,000 $7,000 $18,000 Jane $4,500 $5,500 $6,200 $16,200 Bob $3,200 $3,800 $4,100 $11,100 ───────────────────────────────────────── Total $12,700 $15,300 $17,300 $45,300
WEEK 5-6: SQL FOR DATA ANALYSIS
Module 6: SQL Fundamentals
Lesson 6.1: Introduction to SQL
What is SQL? Structured Query Language. Universal language for interacting with databases.
Why SQL?
- Access large datasets efficiently
- Extract specific data subsets
- Perform complex calculations
- Join data from multiple tables
Lesson 6.2: Basic SELECT Queries
Selecting All Data:
sqlSELECT * FROM customers;Selecting Specific Columns:
sqlSELECT name, email, age FROM customers;Filtering with WHERE:
sqlSELECT * FROM customers WHERE age > 25 AND region = 'North India';Logical Operators:
AND: Both conditions true OR: At least one condition true NOT: Condition not true IN: Value in list BETWEEN: Value in range LIKE: Pattern matchingLesson 6.3: Aggregation and Grouping
Aggregate Functions:
sqlSELECT COUNT(*) as total_customers, AVG(purchase_amount) as avg_purchase, SUM(purchase_amount) as total_revenue, MAX(purchase_amount) as max_purchase FROM orders;GROUP BY - Aggregate by Category:
sqlSELECT region, COUNT(*) as customer_count, SUM(purchase_amount) as total_sales, AVG(purchase_amount) as avg_sale FROM orders GROUP BY region;HAVING - Filter Groups:
sqlSELECT product_category, COUNT(*) as sales_count FROM orders GROUP BY product_category HAVING COUNT(*) > 50;Module 7: Advanced SQL
Lesson 7.1: Joins
INNER JOIN - Common Records
sqlSELECT o.order_id, c.customer_name, o.purchase_amount FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id;LEFT JOIN - All from Left Table
sqlSELECT c.customer_name, COUNT(o.order_id) as order_count FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_name;Lesson 7.2: Subqueries
Subquery in WHERE Clause:
sqlSELECT * FROM customers WHERE customer_id IN ( SELECT customer_id FROM orders WHERE purchase_amount > 10000 );Subquery in FROM Clause:
sqlSELECT region, avg_purchase FROM ( SELECT region, AVG(purchase_amount) as avg_purchase FROM orders GROUP BY region ) as regional_avg WHERE avg_purchase > 5000;Lesson 7.3: Window Functions
Running Total:
sqlSELECT date, amount, SUM(amount) OVER ( ORDER BY date ) as running_total FROM sales ORDER BY date;Rank Within Groups:
sqlSELECT salesperson, sales, RANK() OVER (ORDER BY sales DESC) as rank FROM sales_performance;
WEEK 7-8: PYTHON FOR DATA ANALYSIS
Module 8: Python Basics for Analytics
Lesson 8.1: Python Setup and Libraries
Essential Libraries:
python# Data manipulation import pandas as pd import numpy as np # Visualization import matplotlib.pyplot as plt import seaborn as sns # Statistical analysis from scipy import stats import statsmodels.api as sm # Machine learning (later weeks) from sklearn import preprocessing, ensemble, metricsLesson 8.2: Working with Data in Pandas
Creating DataFrames:
python# From dictionary data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Salary': [50000, 60000, 75000] } df = pd.DataFrame(data) # From CSV df = pd.read_csv('customers.csv') # From Excel df = pd.read_excel('sales_data.xlsx')Exploring Data:
python# First/last rows df.head() df.tail() # Dataset info df.info() # Data types, missing values df.describe() # Statistical summary df.shape # Rows and columns # Check for missing values df.isnull().sum()Data Cleaning:
python# Remove duplicates df = df.drop_duplicates() # Handle missing values df = df.dropna() # Remove rows with NaN df['Age'] = df['Age'].fillna(df['Age'].mean()) # Fill with mean # Remove outliers (values > 3 std devs) df = df[np.abs(stats.zscore(df['Salary'])) < 3] # Data type conversion df['Date'] = pd.to_datetime(df['Date']) df['Age'] = df['Age'].astype(int)Filtering and Selection:
python# Filter rows high_earners = df[df['Salary'] > 60000] # Multiple conditions result = df[(df['Age'] > 25) & (df['Salary'] < 80000)] # Select columns selected = df[['Name', 'Salary']] # Select by location df.loc[0] # First row df.iloc[0:5] # First 5 rows df.loc[df['Age'] > 30, 'Name'] # Names of people over 30Module 9: Data Analysis with Python
Lesson 9.1: Descriptive Statistics
python# Mean, median, mode df['Salary'].mean() df['Salary'].median() df['Salary'].mode() # Spread df['Salary'].std() # Standard deviation df['Salary'].var() # Variance df['Salary'].min() df['Salary'].max() # Percentiles df['Salary'].quantile(0.25) # 25th percentile df['Salary'].quantile(0.75) # 75th percentileLesson 9.2: Grouping and Aggregation
python# Group by single column by_region = df.groupby('Region')['Salary'].mean() # Multiple aggregations summary = df.groupby('Department').agg({ 'Salary': ['mean', 'min', 'max'], 'Age': 'mean', 'Name': 'count' }) # Group by multiple columns by_dept_region = df.groupby(['Department', 'Region'])['Salary'].sum()Lesson 9.3: Merging and Joining Data
python# Merge DataFrames df_merged = pd.merge(df_customers, df_orders, on='customer_id', how='inner') # Concatenate df_combined = pd.concat([df1, df2], axis=0) # Join df_result = df1.join(df2, on='key')Lesson 9.4: Correlation and Statistical Testing
python# Correlation matrix correlation = df.corr() # Correlation with specific column df.corr()['Salary'].sort_values(ascending=False) # Statistical test - t-test (comparing two groups) from scipy import stats group1 = df[df['Region'] == 'North']['Salary'] group2 = df[df['Region'] == 'South']['Salary'] t_stat, p_value = stats.ttest_ind(group1, group2) print(f"T-statistic: {t_stat}, P-value: {p_value}") if p_value < 0.05: print("Significant difference between regions")
WEEK 8-9: DATA VISUALIZATION
Module 10: Visualization Fundamentals
Lesson 10.1: Visualization Principles
Choose Right Chart Type:
Line Chart - Show trends over time
Sales Over 12 Months (ascending trend)Bar Chart - Compare categories
Sales by Region (North > South > East)Pie Chart - Show parts of whole
Market Share: Company A 40%, B 35%, C 25%Scatter Plot - Show relationship between two variables
Age vs Salary (positive correlation)Histogram - Show distribution
Customer age distribution (bell curve)Lesson 10.2: Visualization with Python
Matplotlib - Basic Plotting:
pythonimport matplotlib.pyplot as plt # Line chart plt.figure(figsize=(10, 6)) plt.plot(df['Month'], df['Sales'], marker='o') plt.xlabel('Month') plt.ylabel('Sales ($)') plt.title('Monthly Sales Trend') plt.grid(True) plt.show() # Bar chart plt.bar(df['Region'], df['Total_Sales']) plt.xlabel('Region') plt.ylabel('Sales ($)') plt.title('Sales by Region') plt.show() # Histogram plt.hist(df['Age'], bins=20) plt.xlabel('Age') plt.ylabel('Frequency') plt.title('Age Distribution') plt.show()Seaborn - Advanced Visualization:
pythonimport seaborn as sns # Heatmap showing correlations sns.heatmap(df.corr(), annot=True, cmap='coolwarm') plt.title('Correlation Heatmap') plt.show() # Box plot by category sns.boxplot(x='Region', y='Salary', data=df) plt.title('Salary Distribution by Region') plt.show() # Scatter plot with colors sns.scatterplot(x='Age', y='Salary', hue='Region', data=df) plt.title('Age vs Salary by Region') plt.show()Module 11: Business Intelligence Dashboards
Lesson 11.1: Dashboard Design Principles
KPI (Key Performance Indicator) Selection: Choose 3-5 most important metrics for target audience.
Visual Hierarchy: Most important metrics prominent; supporting details secondary.
Color Usage:
- Green = Good
- Yellow = Warning
- Red = Alert
- Avoid rainbow colors (accessibility)
Interactivity: Filters, drill-downs, date ranges for exploration.
Lesson 11.2: Building Dashboards with Power BI
Dashboard Components:
┌─────────────────────────────────────┐ │ Dashboard: Monthly Performance │ ├─────────────────────────────────────┤ │ Revenue: $2.5M Growth: +15% │ │ Customers: 15K Churn: 2.3% │ ├────────────────────┬────────────────┤ │ Revenue Trend │ Sales by Region│ │ (Line Chart) │ (Pie Chart) │ ├────────────────────┼────────────────┤ │ Top Products │ Forecast │ │ (Bar Chart) │ (Area Chart) │ └────────────────────┴────────────────┘
WEEK 10: EXPLORATORY DATA ANALYSIS (EDA)
Module 12: Conducting Comprehensive EDA
Lesson 12.1: EDA Framework
Step 1: Understand Data Context
- What does data represent?
- How was data collected?
- What are obvious limitations?
Step 2: Explore Structure
- Rows and columns count
- Data types
- Missing values percentage
Step 3: Univariate Analysis Analyze individual variables.
python# Numerical variables df['Age'].describe() plt.hist(df['Age'], bins=30) # Categorical variables df['Region'].value_counts() df['Region'].value_counts().plot(kind='bar')Step 4: Multivariate Analysis Analyze relationships between variables.
python# Correlation between continuous variables df[['Age', 'Salary', 'Experience']].corr() # Relationship between categorical and continuous df.groupby('Region')['Salary'].mean() sns.boxplot(x='Region', y='Salary', data=df)Step 5: Anomalies and Outliers Identify unusual patterns.
python# Identify outliers Q1 = df['Salary'].quantile(0.25) Q3 = df['Salary'].quantile(0.75) IQR = Q3 - Q1 outliers = df[(df['Salary'] < Q1 - 1.5*IQR) | (df['Salary'] > Q3 + 1.5*IQR)] # Flag suspicious patterns df[df['Age'] > 100] # Unrealistic ages df[df['Salary'] < 0] # Negative salaries
WEEK 11: PREDICTIVE ANALYTICS
Module 13: Introduction to Predictive Modeling
Lesson 13.1: Regression Analysis
Linear Regression - Predicting Continuous Values
Problem: Predict employee salary based on experience years
pythonfrom sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score, mean_squared_error import numpy as np # Prepare data X = df[['Years_Experience']].values y = df['Salary'].values # Split into training (80%) and testing (20%) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Train model model = LinearRegression() model.fit(X_train, y_train) # Make predictions y_pred = model.predict(X_test) # Evaluate r2 = r2_score(y_test, y_pred) rmse = np.sqrt(mean_squared_error(y_test, y_pred)) print(f"R² Score: {r2:.2f}") # 0 to 1; higher is better print(f"RMSE: ${rmse:.2f}")Interpretation:
- R² = 0.85: Model explains 85% of salary variation
- RMSE = $5,000: Average prediction error is $5,000
Lesson 13.2: Classification - Predicting Categories
Problem: Predict if customer will churn (leave)
pythonfrom sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score, confusion_matrix import pandas as pd # Prepare features and target X = df[['Age', 'Tenure_Months', 'Monthly_Spend', 'Support_Tickets']] y = df['Churned'] # 0 = No, 1 = Yes # Split data X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2 ) # Train model model = RandomForestClassifier(n_estimators=100) model.fit(X_train, y_train) # Predict y_pred = model.predict(X_test) # Evaluate accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2%}") # Percentage correct # Confusion matrix cm = confusion_matrix(y_test, y_pred) print(cm) # [[TN FP] True Negatives & False Positives # [FN TP]] False Negatives & True Positives
WEEK 12: CAPSTONE PROJECT & CAREER PATHWAYS
Module 14: Capstone Analytics Project
Project: Retail Sales Analysis and Prediction
Scenario: You're hired as data analyst for online retail company. Your task: analyze sales patterns, identify high-value customers, forecast revenue, and recommend strategies.
Dataset Provided:
Columns: Date, Product, Region, Customer_Type, Units_Sold, Unit_Price, Total_Sales, Customer_ID Rows: 100,000+ transactions over 2 yearsPart 1: Exploratory Analysis (Week 12, Days 1-2)
pythonimport pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns # Load data df = pd.read_csv('retail_sales.csv') df['Date'] = pd.to_datetime(df['Date']) # Basic exploration print(f"Dataset shape: {df.shape}") print(f"\nMissing values:\n{df.isnull().sum()}") print(f"\nData types:\n{df.dtypes}") # Statistical summary print(df.describe()) # Analysis print("\nTop 5 Products by Revenue:") print(df.groupby('Product')['Total_Sales'].sum().sort_values(ascending=False).head()) print("\nSales by Region:") print(df.groupby('Region')['Total_Sales'].sum().sort_values(ascending=False)) # Visualizations fig, axes = plt.subplots(2, 2, figsize=(15, 10)) # Sales trend over time daily_sales = df.groupby('Date')['Total_Sales'].sum() axes[0, 0].plot(daily_sales) axes[0, 0].set_title('Daily Sales Trend') # Sales by region df.groupby('Region')['Total_Sales'].sum().plot(kind='bar', ax=axes[0, 1]) axes[0, 1].set_title('Sales by Region') # Product performance df.groupby('Product')['Units_Sold'].sum().plot(kind='barh', ax=axes[1, 0]) axes[1, 0].set_title('Units Sold by Product') # Customer type distribution df.groupby('Customer_Type')['Total_Sales'].mean().plot(kind='bar', ax=axes[1, 1]) axes[1, 1].set_title('Average Sales by Customer Type') plt.tight_layout() plt.show()Part 2: Customer Segmentation (Days 3-4)
python# RFM Analysis: Recency, Frequency, Monetary from datetime import datetime reference_date = df['Date'].max() + timedelta(days=1) rfm = df.groupby('Customer_ID').agg({ 'Date': lambda x: (reference_date - x.max()).days, # Recency 'Customer_ID': 'count', # Frequency 'Total_Sales': 'sum' # Monetary }).rename(columns={ 'Date': 'Recency', 'Customer_ID': 'Frequency', 'Total_Sales': 'Monetary' }) # Score customers (1=worst, 5=best) rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1]) rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5]) rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5]) rfm['RFM_Score'] = rfm['R_Score'].astype(int) + rfm['F_Score'].astype(int) + rfm['M_Score'].astype(int) # Segment customers def segment(score): if score >= 13: return 'Champions' elif score >= 10: return 'Loyal' elif score >= 7: return 'At Risk' else: return 'Lost' rfm['Segment'] = rfm['RFM_Score'].apply(segment) print("\nCustomer Segments:") print(rfm['Segment'].value_counts())Part 3: Revenue Forecasting (Days 5-6)
pythonfrom sklearn.linear_model import LinearRegression import numpy as np # Prepare time series data daily_revenue = df.groupby('Date')['Total_Sales'].sum().reset_index() daily_revenue['Days_Since_Start'] = (daily_revenue['Date'] - daily_revenue['Date'].min()).dt.days # Split into train/test train_size = int(len(daily_revenue) * 0.8) train = daily_revenue[:train_size] test = daily_revenue[train_size:] # Train model X_train = train['Days_Since_Start'].values.reshape(-1, 1) y_train = train['Total_Sales'].values model = LinearRegression() model.fit(X_train, y_train) # Forecast next 30 days future_days = np.arange(daily_revenue['Days_Since_Start'].max() + 1, daily_revenue['Days_Since_Start'].max() + 31).reshape(-1, 1) forecast = model.predict(future_days) print(f"30-Day Revenue Forecast: ${forecast.sum():,.2f}") print(f"Average Daily Forecast: ${forecast.mean():,.2f}") # Visualize plt.figure(figsize=(12, 6)) plt.plot(daily_revenue['Date'], daily_revenue['Total_Sales'], label='Actual') future_dates = [daily_revenue['Date'].max() + timedelta(days=i) for i in range(1, 31)] plt.plot(future_dates, forecast, label='Forecast', linestyle='--') plt.xlabel('Date') plt.ylabel('Daily Revenue ($)') plt.title('Revenue Forecast') plt.legend() plt.show()Part 4: Insights and Recommendations (Day 7)
Present findings:
- Top Performing Products: Product A and C drive 60% of revenue
- High-Value Customers: Top 20% of customers generate 80% of revenue
- Growth Opportunity: Region South underperforming; has potential
- Forecast: Expected 8% revenue growth next month
- Recommendations:
- Expand Product A in Region South
- Launch loyalty program for Champions segment
- Implement retention campaign for At Risk customers
- Optimize pricing in high-margin products
CAREER PATHWAYS AND ADVANCED TOPICS
Module 15: Career Developmentgf
Lesson 15.1: Data Analyst Career Paths
Entry-Level Analyst
- 0-2 years experience
- Focus: Data collection, cleaning, basic reporting
- Salary: ₹3-6 lakhs annually
- Skills: Excel, SQL, basic Python/R
Mid-Level Analyst
- 2-5 years experience
- Focus: Advanced analysis, dashboards, project leadership
- Salary: ₹6-12 lakhs
- Skills: SQL, Python, Power
- Get link
- X
- Other Apps
Comments
Post a Comment