Published on 2025-06-26T05:27:11Z

What is Pearson Correlation? Examples & Applications in Analytics

Pearson Correlation is a statistical measure that quantifies the linear relationship between two continuous variables. In analytics, it’s commonly used to understand how two metrics, such as pageviews and session duration, move together. The coefficient ® ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear relationship.

This metric helps analysts identify feature associations, select variables for predictive models, and interpret A/B test outcomes. It assumes data are normally distributed, linearly related, and free of extreme outliers. While powerful, Pearson correlation can be misleading for nonlinear relationships or skewed distributions, so it’s important to validate assumptions before drawing conclusions.

Illustration of Pearson correlation

Pearson correlation

Measure of linear relationship between two metrics, ranging from -1 to +1, indicating strength and direction in analytics data.

Why Pearson Correlation Matters in Analytics

Understanding how metrics relate is key to data-driven decision making. Pearson correlation provides a simple yet powerful way to assess linear relationships between continuous variables in web and product analytics.

Measuring linear relationships

Pearson correlation quantifies the strength and direction of a linear association between two metrics, for example, session duration and pageviews per user.
Typical use cases

Use cases include evaluating how changes in one metric (like average session duration) associate with another (like bounce rate) or identifying correlated features for predictive modeling.

Calculating Pearson Correlation

Learn the formula behind Pearson correlation, the steps to compute it manually, and how to leverage code or spreadsheet tools for practical calculation.

Pearson formula

The coefficient r is calculated as the covariance of X and Y divided by the product of their standard deviations: r = cov(X, Y) / (σX · σY).

Code example in python

Use Python’s pandas library to compute correlation on exported analytics data.

Python example:

import pandas as pd
# Assume df has 'pageviews' and 'session_duration'
r = df['pageviews'].corr(df['session_duration'])
print('Pearson r:', r)

Implementing with GA4 and PlainSignal

Extract data from popular analytics platforms like GA4 and PlainSignal to calculate Pearson correlation in your preferred environment.

Ga4 via bigquery

Export GA4 data to BigQuery and execute SQL to compute correlation between metrics.

Bigquery sql:

SELECT
  CORR(event_count, user_engagement_time) AS pearson_r
FROM
  `project.dataset.events_*`
WHERE
  _TABLE_SUFFIX BETWEEN '20250101' AND '20250601';

Plainsignal cookie-free data

Embed PlainSignal’s script to collect pageview and engagement metrics, then calculate correlation externally.

Embedding plainsignal script:

<link rel="preconnect" href="//eu.plainsignal.com/" crossorigin />
<script defer data-do="yourwebsitedomain.com" data-id="0GQV1xmtzQQ" data-api="//eu.plainsignal.com" src="//cdn.plainsignal.com/PlainSignal-min.js"></script>

Limitations and Best Practices

Pearson correlation is sensitive to its assumptions and data quality. Follow best practices to ensure accurate interpretation.

Assumptions to check

Data should be linearly related, approximately normally distributed, and free from significant outliers.
- Linearity:
  Inspect scatter plots for a linear pattern.
- Normality:
  Assess distributions using histograms or normality tests like Shapiro-Wilk.
- Outliers:
  Identify and handle outliers, as they can heavily skew the correlation coefficient.
Interpreting values

Values close to +1 or -1 indicate strong linear relationships; values near 0 suggest weak or no linear association.

Pearson correlation

Why Pearson Correlation Matters in Analytics

Measuring linear relationships

Typical use cases

Calculating Pearson Correlation

Pearson formula

Code example in python

Implementing with GA4 and PlainSignal

Ga4 via bigquery

Plainsignal cookie-free data

Limitations and Best Practices

Assumptions to check

Interpreting values

Related terms