Shopify + Salesforce + HubSpot: A Practical Guide to Unified Customer Data

William Flaiz • January 14, 2026

You've got three platforms. Each one holds a piece of your customer puzzle. Shopify knows what they bought. Salesforce tracks the sales conversations. HubSpot manages the marketing touches.


And none of them agree on who "John Smith" actually is.


This is the reality for most growing businesses. The tools work great individually. But getting them to share a single, accurate view of your customer? That's where things get messy.



Here's a practical guide to unifying customer data across these three platforms—without writing custom code or hiring a data engineer.

Diagram showing data flow from Shopify, Salesforce, and HubSpot to a verified user profile.

Why These Three Systems Fight Each Other

Before diving into solutions, it helps to understand why the problem exists in the first place.



Each platform was built with different priorities.


Shopify cares about transactions. A customer is defined by their email at checkout. Maybe their shipping address. It doesn't care much about company hierarchies or lead scoring. Someone buys something, Shopify captures the sale.


Salesforce lives in a different world entirely. Contacts belong to Accounts. Accounts have hierarchies. Opportunities tie to specific people who influence purchase decisions. The whole structure assumes complex B2B sales cycles.


HubSpot sits somewhere in between. Contacts have properties. Those contacts can belong to companies. Marketing campaigns create new contacts constantly—webinar signups, ebook downloads, demo requests. Volume matters here.


Three different philosophies. Three different data models. One customer trying to exist in all of them simultaneously.


The Schema Conflicts You'll Actually Hit

Let's get specific about what goes wrong.


Email as identifier (sounds simple, isn't)

  • Shopify:  customer.email
  • Salesforce:  Contact.Email
  • HubSpot:  email


Easy match, right? Until someone uses their work email in HubSpot, personal email in Shopify, and their assistant's email got entered in Salesforce. Same person. Three different identities.


Name field variations

  • Shopify stores  first_name   and  last_name   separately. Clean, predictable.
  • Salesforce has  FirstName LastName , plus  Suffix MiddleName , and  Salutation . More fields means more opportunities for inconsistency.
  • HubSpot uses  firstname   and  lastname   (lowercase, no underscore). It also has  hs_full_name   that sometimes gets populated, sometimes doesn't.


Phone number formatting

  • Shopify:  +1 (555) 123-4567
  • Salesforce:  555.123.4567
  • HubSpot:  5551234567


Same number. Completely different strings. A naive merge will create three records for one customer.


Address components

  • Shopify breaks addresses into  address1 address2 city province country zip .
  • Salesforce has  MailingStreet MailingCity MailingState MailingCountry MailingPostalCode . Plus separate fields for "Other Address" and "Billing Address."
  • HubSpot stores  address city state country zip . Similar to Shopify, but the field names don't match.


Merging address data manually means mapping every field. Miss one, lose data.

Identity Resolution: Finding the Same Person Across Platforms

This is the core challenge. You have records from three systems. Some represent the same person. Some don't. How do you figure out which is which?


Method 1: Exact email matching

The simplest approach. Match records where emails are identical.


Works well when: Customers use consistent emails everywhere. B2B contexts where corporate emails are standard. Clean, maintained databases.


Falls apart when: People use multiple email addresses. Personal vs. work email situations. Typos in email entry. Partner or assistant emails entered instead of the actual contact.


Email matching will find maybe 60-70% of your true duplicates if you're lucky. It's a start, not a solution.


Method 2: Fuzzy name + company matching

When emails don't match, look at name and company combinations.


"Jon Smith at Acme Corp" and "Jonathan Smith at ACME Corporation" are probably the same person. Traditional string matching won't catch this. You need fuzzy matching that understands "Jon" and "Jonathan" are related. That "Corp" and "Corporation" mean the same thing.


This approach catches another 15-20% of duplicates that exact matching misses. But it also introduces false positives. "John Smith at Acme" and "John Smith at Acme Tools" might be different people entirely.


Method 3: Semantic similarity

The most sophisticated approach uses AI to understand meaning, not just strings.


Instead of comparing characters, semantic matching compares the overall meaning of records. It considers multiple fields together—name, company, email domain, phone area code, location. A record that matches on three of five fields might score higher than one that matches perfectly on just email.


This is how modern data cleaning tools find the duplicates humans miss. And it's the only reliable method when dealing with messy, real-world data from multiple sources.

A Practical Merge Workflow

Here's a step-by-step process that works without custom code.


Step 1: Export your data

Pull customer/contact data from all three platforms.

From Shopify: Admin → Customers → Export → CSV

From Salesforce: Reports → Contacts → Export (or Data Export if you have bulk access)

From HubSpot: Contacts → Export → All contacts

You'll end up with three files. Different columns, different formats, same underlying people (hopefully).


Step 2: Decide on your master source

Before merging, choose which system wins when data conflicts. This matters more than you'd think.


If Salesforce is your CRM of record for the sales team, make it the master for company and contact relationship data.


If HubSpot is running your marketing, it should be authoritative for email preferences and subscription status.

If Shopify tracks purchases, it's the master for transaction history and lifetime value.


You can't have three masters. Pick one primary source per field type.



Step 3: Map your fields

Create a mapping document. Here's what it might look like for basic contact info:

Unified Field Shopify Source Salesforce Source HubSpot Source
email customer.email Contact.Email email
first_name first_name FirstName firstname
last_name last_name LastName lastname
phone phone Phone phone
company Account.Name company

Notice Shopify doesn't have a company field at all. That's a gap you'll need to fill from another source.


Step 4: Standardize formats before matching

This step gets skipped way too often. Before trying to find duplicates, normalize your data.

  • Phone numbers should all follow the same format. E.164 international format (+15551234567) works across all three platforms and eliminates formatting discrepancies.
  • Email addresses should be lowercase. "John@Company.com" and "john@company.com" should match.
  • Names should have consistent capitalization. "JOHN SMITH" and "John Smith" should merge, not create duplicates.
  • Dates need a standard format. Shopify uses ISO dates. Salesforce might have MM/DD/YYYY. Pick one, convert everything.


Step 5: Run duplicate detection

With standardized data, find your matches.

  • Start with exact email matching. That catches the obvious duplicates.
  • Then run fuzzy matching on name + company for records without email matches.
  • Finally, use semantic similarity for the remaining unmatched records.


Review the suggested matches before merging. Automated matching is smart, but human review catches edge cases.


Step 6: Merge and resolve conflicts

When you find a match, combine the records using your master source hierarchy.

  • Record A (Shopify): email = john@gmail.com, name = John Smith
  • Record B (Salesforce): email = jsmith@acme.com, name = Jonathan Smith, company = Acme Corp
  • Record C (HubSpot): email = john@gmail.com, name = Jon Smith
  • Merged record (using Salesforce as name master, preserving all emails):
  • Primary email: jsmith@acme.com (work)
  • Secondary email: john@gmail.com (personal)
  • Name: Jonathan Smith
  • Company: Acme Corp


All three platform identities now point to one unified customer record.

Field Mapping Examples That Actually Work

Here are practical mappings for common scenarios.



B2B SaaS company mapping:

Purpose Shopify Salesforce HubSpot Notes
Primary identifier email Email email Match on all three
Revenue data total_spent Total Revenue (formula) hs_lifecyclestage_customer_date Shopify is authoritative
Engagement score Lead Score HubSpot Score HubSpot is authoritative
Sales stage Opportunity.Stage lifecyclestage Salesforce is authoritative

E-commerce company mapping:

Purpose Shopify Salesforce HubSpot Notes
Order count orders_count number_of_orders Shopify is authoritative
Last purchase last_order_date recent_conversion_date Shopify is authoritative
Email opt-in accepts_marketing HasOptedOutOfEmail (inverse) hs_email_optout (inverse) HubSpot is authoritative

Testing Your Merged Data

Before pushing unified data back to any system, validate it.


Sample check (quick validation)

Pull 50 random merged records. Manually verify 10 of them against the source systems. If more than 1 has errors, your process needs adjustment.


Edge case review

Look specifically at:

  • Records that matched on fuzzy criteria (not exact email)
  • Customers with multiple email addresses
  • High-value customers where errors cost more
  • Recently created records (most likely to have issues)


Duplicate count comparison

If you started with 10,000 records across three systems and ended with 8,500 unified records, that's a 15% deduplication rate. Reasonable for moderately clean data.


If you're seeing 40%+ deduplication, either your data was really messy or your matching is too aggressive. Review the matches before proceeding.


When Things Go Wrong: Rollback Planning

Always keep your original exports. Don't delete them after merging.


Before pushing merged data back to any platform, document:

  • What data existed before the merge
  • What changes you're making
  • How to reverse those changes if needed


Most platforms don't have a true "undo" for bulk data changes. Your rollback plan is reimporting the original data and manually fixing any records that got touched.


This is tedious. Which is why you validate before pushing.


The Faster Path

Everything I've described works. It's also time-consuming. Manual exports, spreadsheet mapping, careful review—it adds up to hours of work for a few thousand records. Days for larger datasets.


That's exactly why we built CleanSmart.


Upload your exports. CleanSmart handles the standardization, runs semantic duplicate detection across all three files, and shows you proposed matches before anything changes. You review, approve, and download a unified dataset.


The manual process takes 4-8 hours for a mid-sized dataset. CleanSmart does it in minutes.

Ready to unify your customer data?


Upload your Shopify, Salesforce, and HubSpot exports with our Business plan and see exactly how many duplicates are hiding in your data.

Start cleaning for free →

William Flaiz is a digital transformation executive and former Novartis Executive Director who has led consolidation initiatives saving enterprises over $200M in operational costs. He holds MIT's Applied Generative AI certification and specializes in helping pharmaceutical and healthcare companies align MarTech with customer-centric objectives. Connect with him on LinkedIn or at williamflaiz.com.

Scientific diagram: Particles passing through a funnel, with a laser beam hitting a hexagonal target labeled
By William Flaiz January 7, 2026
Build a 0-100 Clarity Score to measure data quality. Covers completeness, consistency, duplicates, anomalies—plus a scorecard template.
Digital shield over a network of hexagons and circuits, with a green gradient.
By William Flaiz January 2, 2026
A practical playbook for RevOps leaders: roles, rituals, templates, and a quarterly roadmap to build data trust across your organization.
Abstract illustration of data transformation through a system. Numbers and data flow, changing from the left to a new form on the right.
By William Flaiz December 30, 2025
Your CRM has the same phone number stored 47 different ways. Here's why that happens and how to fix it permanently.
Digital workflow with glowing checkmarks moving through square panels to complete a checklist.
By William Flaiz December 29, 2025
Stop catching CSV errors after they've already broken something. These validation rules prevent bad data from getting into your system in the first place.
Abstract digital graphic with hexagons, dots, and glowing lines, set against a light blue background.
By William Flaiz December 23, 2025
Learn when simple rules suffice and when ML pays off. Spot outliers, cut false positives, and protect decisions with CleanSmart’s LogicGuard.
Grid of tiles with some highlighted in green, a green speedometer at the bottom.
By William Flaiz December 22, 2025
A practical guide to missing data: when to impute and when to flag. Boost data trust with SmartFill confidence scores for cleaner, reliable analytics.
Diagram of a data network with hexagonal grid and nodes connected by lines.
By William Flaiz December 18, 2025
Fuzzy matching misses duplicates that semantic AI catches. Learn why "Jon Smyth" and "Jonathan Smith" slip through traditional deduplication—and how to fix it.
Abstract illustration of data processing: a cube with data streams connecting to a honeycomb structure, all in shades of blue and white.
By William Flaiz December 17, 2025
CSVs are everywhere—and so are their problems. Encoding nightmares, Excel date mangling, delimiter chaos. Learn what goes wrong and how to fix it.
Abstract illustration of data transformation, with fragmented elements flowing toward a glowing cube on a platform.
By William Flaiz December 12, 2025
The cost of bad data is wasted spend, missed deals, and broken trust. Learn how to quantify it, stop duplicates, standardize, and build a lasting fix.
Diagram depicting data filtering through a series of layered structures, represented by rectangles, with connecting lines.
By William Flaiz December 9, 2025
You've got a dataset. You've got a deadline. You've got a boss who wants insights by Thursday. The temptation is to skip straight to the analysis. Don't. Dirty data doesn't announce itself. It hides in plain sight until your quarterly report shows revenue doubled (it didn't) or your email campaign goes out to 4,000 contacts who are actually the same 900 people entered multiple ways. I've seen both happen. The revenue one was worse. Here's what to check before you trust any dataset enough to make decisions from it.