Advanced Hypothesis Testing in Python
Introduction to Two-Sample Tests
Two-sample hypothesis testing allows us to compare statistics across different groups within our data. Let's explore this concept through both theoretical understanding and practical applications.
Business Context and Real-World Applications
Consider these practical scenarios where two-sample tests are valuable:
-
A/B Testing in E-commerce:
- Comparing conversion rates between two website designs
- Analyzing customer spending across different marketing campaigns
- Evaluating user engagement metrics between mobile and desktop users
-
HR Analytics:
- Comparing salaries between different departments
- Analyzing performance metrics between remote and office workers
- Evaluating training program effectiveness
Let's implement these concepts using Python.
T-Tests Implementation
Two-Sample Independent T-Test
import pandas as pd
import numpy as np
from scipy.stats import t
import seaborn as sns
import matplotlib.pyplot as plt
def perform_two_sample_ttest(data, group_column, value_column, group1, group2, alpha=0.05):
"""
Performs a two-sample t-test with detailed analysis and visualization.
Parameters:
-----------
data : pandas DataFrame
The dataset containing the groups and values
group_column : str
Name of the column containing group labels
value_column : str
Name of the column containing the values to compare
group1, group2 : str
Names of the groups to compare
alpha : float
Significance level
"""
# Extract the two groups
sample1 = data[data[group_column] == group1][value_column]
sample2 = data[data[group_column] == group2][value_column]
# Calculate basic statistics
n1, n2 = len(sample1), len(sample2)
mean1, mean2 = sample1.mean(), sample2.mean()
var1, var2 = sample1.var(ddof=1), sample2.var(ddof=1)
# Calculate t-statistic
pooled_se = np.sqrt(var1/n1 + var2/n2)
t_stat = (mean1 - mean2) / pooled_se
# Calculate degrees of freedom
df = n1 + n2 - 2
# Calculate p-value (two-tailed test)
p_value = 2 * (1 - t.cdf(abs(t_stat), df))
# Create visualization
plt.figure(figsize=(10, 6))
# Create violin plots
sns.violinplot(data=data, x=group_column, y=value_column)
plt.title(f'Distribution Comparison: {group1} vs {group2}')
# Print results
results = {
'Group 1 Mean': mean1,
'Group 2 Mean': mean2,
'Mean Difference': mean1 - mean2,
'T-statistic': t_stat,
'P-value': p_value,
'Significant': p_value < alpha
}
return results, plt.gcf()
# Example usage with Stack Overflow data
stack_overflow = pd.read_csv('stack_overflow_data.csv')
results, fig = perform_two_sample_ttest(
stack_overflow,
'age_first_code_cut',
'converted_comp',
'child',
'adult',
alpha=0.05
)
Paired T-Test Implementation
Paired t-tests are used when we have matched pairs of observations. Here's a comprehensive implementation:
def perform_paired_ttest(data, before_col, after_col, alpha=0.05):
"""
Performs a paired t-test with visualization and detailed analysis.
Parameters:
-----------
data : pandas DataFrame
Dataset containing before and after measurements
before_col, after_col : str
Column names for before and after measurements
alpha : float
Significance level
"""
# Calculate differences
differences = data[after_col] - data[before_col]
# Basic statistics
mean_diff = differences.mean()
std_diff = differences.std(ddof=1)
n = len(differences)
# Calculate t-statistic
t_stat = mean_diff / (std_diff / np.sqrt(n))
# Degrees of freedom
df = n - 1
# Calculate p-value (two-tailed)
p_value = 2 * (1 - t.cdf(abs(t_stat), df))
# Create visualizations
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
# Before-After plot
ax1.scatter(data[before_col], data[after_col])
min_val = min(data[before_col].min(), data[after_col].min())
max_val = max(data[before_col].max(), data[after_col].max())
ax1.plot([min_val, max_val], [min_val, max_val], 'r--')
ax1.set_xlabel('Before')
ax1.set_ylabel('After')
ax1.set_title('Before vs After Measurements')
# Differences histogram
sns.histplot(differences, kde=True, ax=ax2)
ax2.axvline(x=0, color='r', linestyle='--')
ax2.set_title('Distribution of Differences')
results = {
'Mean Difference': mean_diff,
'Standard Deviation of Differences': std_diff,
'T-statistic': t_stat,
'P-value': p_value,
'Significant': p_value < alpha,
'Confidence Interval': t.interval(1-alpha, df,
loc=mean_diff,
scale=std_diff/np.sqrt(n))
}
return results, fig
# Example with Republican voting data
repub_votes = pd.read_csv('republican_votes.csv')
results, fig = perform_paired_ttest(
repub_votes,
'repub_percent_08',
'repub_percent_12',
alpha=0.05
)
ANOVA Testing
ANOVA (Analysis of Variance) is used when comparing more than two groups. Here's a comprehensive implementation:
def perform_anova_analysis(data, group_column, value_column, alpha=0.05):
"""
Performs one-way ANOVA with visualization and post-hoc analysis.
Parameters:
-----------
data : pandas DataFrame
Dataset containing groups and values
group_column : str
Column name containing group labels
value_column : str
Column name containing values to compare
alpha : float
Significance level
"""
import pingouin as pg
# Perform ANOVA
anova_results = pg.anova(data=data,
dv=value_column,
between=group_column)
# Perform pairwise t-tests with Bonferroni correction
posthoc = pg.pairwise_tests(data=data,
dv=value_column,
between=group_column,
padjust='bonf')
# Create visualizations
plt.figure(figsize=(12, 6))
# Box plot
sns.boxplot(data=data, x=group_column, y=value_column)
plt.xticks(rotation=45)
plt.title('Distribution by Group')
# Add statistical annotations
if anova_results['p-unc'].iloc[0] < alpha:
plt.text(0.02, 0.98, 'Significant differences detected',
transform=plt.gca().transAxes,
verticalalignment='top',
color='red')
return {
'anova_results': anova_results,
'posthoc_tests': posthoc,
'plot': plt.gcf()
}
# Example with job satisfaction data
results = perform_anova_analysis(
stack_overflow,
'job_sat',
'converted_comp',
alpha=0.05
)
Statistical Power Analysis
Understanding statistical power is crucial for designing effective hypothesis tests:
def calculate_power_analysis(data1, data2, alpha=0.05, n_simulations=1000):
"""
Performs power analysis through simulation.
Parameters:
-----------
data1, data2 : array-like
The two groups to compare
alpha : float
Significance level
n_simulations : int
Number of simulations to run
"""
from scipy import stats
# Calculate effect size
effect_size = (np.mean(data1) - np.mean(data2)) / \
np.sqrt((np.var(data1) + np.var(data2)) / 2)
# Simulate tests
significant_tests = 0
for _ in range(n_simulations):
# Resample with replacement
sample1 = np.random.choice(data1, size=len(data1), replace=True)
sample2 = np.random.choice(data2, size=len(data2), replace=True)
# Perform t-test
_, p_value = stats.ttest_ind(sample1, sample2)
if p_value < alpha:
significant_tests += 1
power = significant_tests / n_simulations
return {
'effect_size': effect_size,
'power': power,
'n_simulations': n_simulations
}
Best Practices and Guidelines
-
Choosing the Right Test
- Use paired t-tests when observations are naturally paired
- Use independent t-tests when comparing unrelated groups
- Use ANOVA when comparing more than two groups
- Consider non-parametric alternatives when assumptions are violated
-
Sample Size Considerations
- Larger sample sizes increase statistical power
- Use power analysis to determine required sample size
- Consider practical significance alongside statistical significance
-
Assumptions Checking
- Normality of distributions
- Homogeneity of variances
- Independence of observations
-
Multiple Testing
- Apply appropriate corrections (e.g., Bonferroni) when performing multiple tests
- Consider family-wise error rate
- Be cautious of data dredging
-
Reporting Results
- Always report effect sizes alongside p-values
- Include confidence intervals
- Provide clear visualizations
- Document all decisions and assumptions
Real-World Applications
E-commerce Example
# Example: Analyzing customer spending between mobile and desktop users
def analyze_platform_spending(data):
"""
Analyzes spending patterns between mobile and desktop users.
"""
results, fig = perform_two_sample_ttest(
data,
'platform',
'spending',
'mobile',
'desktop'
)
# Additional business metrics
roi_mobile = calculate_roi(data[data['platform'] == 'mobile'])
roi_desktop = calculate_roi(data[data['platform'] == 'desktop'])
return {
'statistical_results': results,
'visualization': fig,
'business_metrics': {
'mobile_roi': roi_mobile,
'desktop_roi': roi_desktop
}
}
HR Analytics Example
# Example: Analyzing salary differences across departments
def analyze_salary_equity(data):
"""
Performs comprehensive salary analysis across departments.
"""
# Perform ANOVA
anova_results = perform_anova_analysis(
data,
'department',
'salary'
)
# Additional equity metrics
gender_analysis = analyze_gender_pay_gap(data)
experience_analysis = analyze_experience_impact(data)
return {
'anova_results': anova_results,
'equity_metrics': {
'gender_analysis': gender_analysis,
'experience_impact': experience_analysis
}
}
This documentation provides a comprehensive guide to performing various types of hypothesis tests in Python, complete with practical examples and business applications. The code is structured to be both educational and immediately useful in real-world scenarios.