Phone Number Formatting
Why Your CRM Has 47 Versions of the Same Number
I once watched a sales rep spend twenty minutes trying to figure out why a contact wasn't in the CRM. The contact was there—four times, actually. Same person, same phone number, stored as (555) 867-5309, 555-867-5309, 5558675309, and +1 555 867 5309.
Four records. One human. Zero way for the system to know they were duplicates.
This happens in every CRM I've ever seen. Phone numbers are deceptively simple—just digits, right? But the ways people write them down, the ways systems store them, and the ways imports mangle them create a mess that compounds over time. And unlike misspelled names, which humans can usually puzzle out, phone format variations break automated matching completely.

The Chaos Is Real
Pull a random sample of 100 contacts from your CRM right now. I'll bet you find at least a dozen different phone formats.
Here's what I typically see:
| Format | Example | Source |
|---|---|---|
| Parentheses + hyphen | (555) 867-5309 | Manual entry, US convention |
| Hyphens only | 555-867-5309 | Manual entry, alternative style |
| Dots | 555.867.5309 | Some web forms, design preference |
| Spaces | 555 867 5309 | European convention, some imports |
| No separators | 5558675309 | System exports, API data |
| Country code, no plus | 1 555 867 5309 | International imports |
| Country code with plus | +1 555 867 5309 | E.164 format (the correct one) |
| Country code, no spaces | 15558675309 | Some APIs, telephony systems |
That's eight formats for the same number. And I haven't even gotten into the weird stuff: leading zeros that Excel ate, extension suffixes (x123, ext. 123, #123), parenthetical notes ("ask for Jim"), and the occasional letter that someone typed because their keyboard was in the wrong mode.
Every format variation is another potential duplicate your system won't catch.
Why This Happens
Three culprits, mostly.
Manual entry with no validation. Your web form asks for a phone number and accepts whatever someone types. Some people use parentheses because that's how they learned it. Others use dots because it looks cleaner. International visitors add their country code; domestic users don't. The form doesn't care. It just stores the string.
Imports from different sources. Your marketing list from the trade show uses one format. Your sales team's LinkedIn exports use another. The customer data from your acquired company uses a third. When these merge into your CRM, you get format chaos layered on top of whatever chaos was already there.
Copy-paste from anywhere. Someone copies a number from an email signature, a website, a PDF. Each source has its own formatting conventions. The number lands in your CRM exactly as copied, formatting quirks and all.
None of this is malicious. People aren't trying to create duplicates or break your reports. They're just entering data the way that makes sense to them, and no system is enforcing consistency.
E.164: The Format That Should Rule Them All
There's actually a standard for phone numbers. It's called E.164, and it looks like this: +15558675309.
The rules are simple. Start with a plus sign. Follow with the country code (1 for US/Canada). Then the full national number with no spaces, hyphens, or other separators. Maximum 15 digits total.
E.164 exists because telephone systems need unambiguous routing. When your phone dials +44 20 7946 0958, the plus sign tells the system to expect a country code. The 44 routes to the UK. The rest gets handled by UK telephone infrastructure. No guessing, no regional assumptions.
For database purposes, E.164 is ideal because it's completely consistent. Every number follows the same pattern. Comparison is trivial—two numbers match if and only if their E.164 representations are identical. No fuzzy matching required.
The downside? Nobody actually types phone numbers this way. It's ugly. It's unfamiliar. Asking users to enter +1 before their area code creates friction and confusion. So we're stuck with a gap between how humans write numbers and how databases should store them.
The solution: let people enter numbers however they want, then convert to E.164 behind the scenes.
What Standardization Actually Looks Like
Here's the transformation CleanSmart's AutoFormat applies:
| Input (raw) | Output (E.164) |
|---|---|
| (555) 867-5309 | 15558675309 |
| 555.867.5309 | 15558675309 |
| 1-555-867-5309 | 15558675309 |
| +1 (555)867-5309 | 15558675309 |
| 5558675309 | 15558675309 |
All five inputs become the same output. Now your deduplication actually works. Your matching algorithms can do exact comparison instead of fuzzy guessing. Your reports don't count the same customer multiple times.
The conversion requires knowing (or assuming) the country. A 10-digit number in a US company's CRM is almost certainly a US number, so AutoFormat adds +1. For explicitly international numbers—those starting with + or containing country codes—the system preserves the original country.
You don't lose the original format, either. CleanSmart keeps the raw input in a separate field so you can see what was actually entered. The standardized version is for matching and storage; the original is for reference.
Edge Cases That Break Simple Solutions
Phone standardization sounds straightforward until you hit the edge cases.
Extensions. Business numbers often have extensions: 555-867-5309 x1234. E.164 doesn't handle extensions—they're a PBX feature, not a telephone network feature. The solution is to store extensions separately. Strip them during standardization, preserve them in a dedicated field.
Country code ambiguity. The number 020 7946 0958 could be a London number (missing the +44) or something else entirely. Without context, you're guessing. If your data is primarily from one country, assume that country. If it's international, you might need a lookup based on lead source or address data.
Short codes and service numbers. 911 isn't a valid E.164 number. Neither is 411 or 1-800-FLOWERS. These need special handling—either flagging as non-standard or conversion rules specific to their type.
Landlines vs. mobile. In some countries, mobile and landline numbers have different length requirements or prefix patterns. A number that looks valid as a mobile might be impossible as a landline. Full validation requires knowing which type you're dealing with.
Vanity numbers. 1-800-CONTACTS contains letters. Technically, you can convert to digits (1-800-266-8228), but you might want to preserve the memorable version for display purposes.
The point isn't that standardization is impossible—it's that naive regex solutions will break on real-world data. You need a library that understands phone number conventions, not a find-and-replace.

Prevention: Stop the Chaos at the Source
Standardizing existing data is cleanup. Preventing future chaos is where you actually win.
Input masking. Guide users toward consistent formats with visual hints. Show placeholder text like "(555) 555-5555" so they know what you expect. Auto-format as they type if your form library supports it.
Validation on entry. Check that the number has the right digit count and plausible structure before accepting it. Reject obviously invalid entries (too short, too long, impossible area codes) rather than letting garbage into your database.
Standardize on save. Whatever format the user enters, convert to E.164 when you store it. Keep the display format pretty for humans; keep the storage format consistent for machines.
Country detection. If you know the user's location (from IP, from their profile, from the form context), use it to infer country code. Don't make US users type +1 if you know they're in the US.
The goal is invisible consistency. Users enter numbers naturally; the system handles standardization automatically. No training required, no "please use this format" messages that everyone ignores anyway.
Fix It Once
Upload your contact list to CleanSmart. AutoFormat will standardize every phone number to E.164, flag the ones that can't be parsed, and show you exactly what changed. Your duplicates become visible. Your matching starts working. And the next import won't make things worse.
Should I store the formatted version or the E.164 version?
Both, ideally. Store E.164 as your canonical version—use it for matching, deduplication, and any programmatic comparison. Store the original or a nicely formatted display version for human-facing contexts. If you can only pick one, pick E.164. You can always format it for display later; you can't reliably reconstruct E.164 from a formatted string without the conversion logic.
What about phone numbers that are clearly fake?
Standardization doesn't validate authenticity. A number like 555-555-5555 will convert to valid E.164 format even though it's obviously placeholder data. For fake detection, you need additional checks: known test prefixes (555 in the US), repeated digit patterns, numbers that fail carrier lookup. Standardization and validation are separate problems—solve standardization first, then layer on validation.
How do I handle international numbers when I don't know the country?
If the number starts with + or 00 (common international dialing prefix), you can usually parse the country code. If it doesn't, you're making assumptions. The safest approach: flag numbers without clear country indicators for manual review, or use the contact's address/location data to infer country. Guessing wrong creates its own problems—a UK number stored with a US country code is effectively corrupted.
William Flaiz is a digital transformation executive and former Novartis Executive Director who has led consolidation initiatives saving enterprises over $200M in operational costs. He holds MIT's Applied Generative AI certification and specializes in helping pharmaceutical and healthcare companies align MarTech with customer-centric objectives. Connect with him on LinkedIn or at williamflaiz.com.











