What Data Does MMM Need?
The three data layers MMM requires (4P, media, external variables), collection methods, and data security principles. Most companies already have sufficient data.
You understand that Marketing Mix Modeling (MMM) is effective for analyzing channel-level ROI and optimizing budgets. But one practical question remains.
"Does our company have the data to actually do this analysis?"
This is the first question that comes up when companies consider adopting MMM. And in most cases, the answer is "you already have enough." The data MMM requires isn't exotic. Sales data accumulated in your ERP, advertising spend reports from media agencies, and publicly available external indicators—most companies already possess or can easily obtain this data.
This article covers:
- The three data layers MMM uses and how to collect each
- How to integrate cross-platform data into a unified analysis framework
- Data security and sensitive information handling principles

The 3 Data Layers MMM Uses
MMM uses three layers of data to explain sales variation. Each layer captures different factors that influence revenue.
Layer 1: 4P Data — The Business Foundation
Data corresponding to the marketing 4Ps (Product, Price, Place, Promotion), reflecting internal business activities.
| Category | Data Items | Primary Source |
|---|---|---|
| **Product** | SKU-level sales, volume, new product launch dates | ERP, POS |
| **Price** | List price, actual selling price, discount rate, promotional pricing | ERP, commerce platforms |
| **Place** | Store-level sales, distribution channel mix, new store openings | ERP, distributor reports |
| **Promotion** | Promotion period, discount depth, scope, bundle composition | ERP, marketing team records |
The core source for this data is the ERP. Most companies already have years of transaction data accumulated in their ERP, which can be extracted and reprocessed through various formulas into MMM input format.
For example, analyzing "promotion effect" requires not just a simple discount rate but a combined metric of discount depth × duration × applicable SKU scope. Creating these combinations from raw ERP data is the essence of data preprocessing.
Layer 2: Media Data — The Complete Picture of Marketing Investment
Data recording costs and exposure by channel for marketing investments. This is the core input for estimating channel-level ROI in MMM.
| Channel Type | Data Items | Collection Method |
|---|---|---|
| **Offline** | TV GRP, radio spots, print placements, OOH exposure | Media rep reports, CSV upload |
| **Online** | Impressions, clicks, cost (CPM/CPC), conversions | API auto-collection (Meta, Google, Naver, etc.) |
| **New Media** | Influencer campaigns, content exposure, sponsorship costs | Agency reports, manual entry |
| **BTL** | Sampling volume, event participants, experience group size | Marketing team records, CSV |
Digital channels support automated collection via API. Major platforms like Meta Ads, Google Ads, and Naver Search Ads provide standardized APIs, enabling automatic retrieval of impression, click, and cost data on daily or weekly basis.
Offline channels are relatively manual. For TV, GRP reports from media representatives are used; for OOH, exposure estimates from media companies are uploaded as CSV files.
What matters here is cross-platform data integration. Each platform uses different metric names and units. TV uses GRP, digital uses Impressions, influencer uses View Count—connecting these differently-unitized data points into a single analysis framework requires a marketing taxonomy.
A taxonomy is a system that classifies channels, campaigns, and creatives into a consistent hierarchy. It creates linkages like "this TV ad and this digital campaign are part of the same brand campaign," transforming fragmented data into a format ready for integrated analysis.
Layer 3: External Variables — Influences Beyond Marketing
External factors that affect sales but are unrelated to marketing activities. Controlling for these variables in MMM is essential to accurately estimate marketing's pure effect.
| Category | Data Items | Collection Method |
|---|---|---|
| **Seasonality** | Holidays, seasonal events, day-of-week effects | Calendar-based auto-generation |
| **Economy** | Consumer price index, exchange rates, interest rates | Public APIs (Bank of Korea, Statistics Korea) |
| **Events** | Weather, sports events, social issues | Weather API, news crawling |
| **Competition** | Competitor promotions, new launches, price changes | Commerce crawling, search trends, buzz analysis |
Most external variables can be automatically collected via public APIs. Seasonality is auto-generated from calendars, economic indicators come from central bank and statistics bureau APIs, and weather data comes from meteorological service APIs.
Competitor data is harder to obtain directly. In this case, proxy indicators are used. Crawling competitor product pricing and promotion history from commerce sites, or using search trend data and social buzz volume as proxy variables for competitor activity. Even without direct revenue data, these proxies are sufficient to reflect competitive dynamics in the model.
Data Security: Safe Collection, Management, and Disposal
Providing corporate data externally for MMM analysis naturally raises security concerns. MadMatics strictly adheres to the following principles.
No PII (Personally Identifiable Information)
MMM uses aggregated data. Weekly sales totals, channel-level ad spend aggregates, monthly exposure volumes—all aggregate-level numbers. PII such as individual customer names, contacts, or purchase histories is neither collected nor needed.
Scale Transformation for Sensitive Business Data
Sensitive business figures like absolute revenue and advertising budgets can have scaling transformations applied. For example, converting actual sales to an index format makes the original scale unidentifiable while preserving identical variation patterns, leaving analysis results unaffected.
| Security Measure | Description |
|---|---|
| **No PII collection** | Personal identifiable information is never collected, stored, or processed |
| **Scale transformation** | Sensitive figures indexed to de-identify original scale |
| **Purpose limitation** | Collected data cannot be used beyond MMM analysis |
| **Secure disposal** | Complete deletion following agreed procedures after project completion |
Data security is as important as analysis quality. The concern that "handing over data feels risky" is entirely valid, and addressing that concern is an analytics partner's first responsibility.
Data Scarcity Is a Misconception
In summary, MMM requires data across three layers:
- 4P Data: Business fundamentals extractable and reprocessable from ERP
- Media Data: Digital via API auto-collection, offline via CSV upload, cross-platform integration through taxonomy
- External Variables: Auto-collected via public APIs, competitor data leveraged through proxy indicators
Most companies already possess 70–80% of this data. What's lacking isn't data itself, but the collection, cleansing, and integration system that connects fragmented data into a unified analysis framework.
MadMatics has built end-to-end data infrastructure including data collection templates, API integration pipelines, and taxonomy design. Rather than worrying "we can't do MMM because we lack data," the first step is discussing together "which data should we organize first."
If you need the full process from data collection to analysis to budget optimization, MadMatics Action MMM is ready to help you get started.