Blog
Analytics2025-08-209 min

How to Implement User Identification Across Anonymous and Known Sessions

Users are anonymous before signup and identified after. Here's how to stitch these sessions together for complete journey analytics.Complete setup guide with tracking plans, data models, and report...

A visitor arrives at your website for the first time. Your analytics tool assigns them an anonymous ID, something like anon_7f3a2b9c. They browse three pages, read a blog post, and leave. Two days later, they return on a different device, get a new anonymous ID anon_e4d1f8a0, visit the pricing page, and leave again. A week later, they sign up for a trial using their email. Now they are user_12345. Your analytics shows three separate people who each did a fraction of the journey. The real journey, a single person moving from awareness to consideration to trial across seven touchpoints over nine days, is invisible.

User identification is the system that connects anonymous activity to known identities, merges sessions across devices and time, and creates a unified view of each person's complete journey. Without it, your funnel metrics are wrong (because the top is inflated with duplicate anonymous visitors), your attribution is broken (because you cannot see the first touch that brought a user who converted weeks later), and your cohort analyses are unreliable (because you are comparing partial journeys to complete ones). Getting user identification right is not a nice-to-have. It is a prerequisite for every other analysis being accurate.

TL;DR
  • Anonymous tracking creates a fragmented view of user journeys. Identity resolution stitches those fragments into a complete picture.
  • The identify call is the critical moment. When a user logs in, signs up, or submits a form, merge their anonymous history into their known profile.
  • Cross-device identity requires a server-side identity graph. Client-side cookies alone cannot solve it.
  • Privacy regulations (GDPR, CCPA) define what you can track and how long you can store it. Build your identity system with compliance as a constraint, not an afterthought.

The Identity Problem in Modern Analytics

The identity problem exists because the web was not designed for persistent user tracking. HTTP is stateless. Browsers are privacy-focused. Users switch between devices, clear cookies, use incognito mode, and block trackers. The average B2B buyer uses 3.2 devices during a purchase journey and may take 30-90 days to convert. Without an identity system, that single buyer looks like 5-10 different visitors in your analytics.

Why Cookies Are Not Enough

First-party cookies remain the primary mechanism for tracking users within a single browser session. When a user visits your site, a cookie is set with an anonymous identifier. On subsequent visits from the same browser, the cookie is read and the visits are connected. This works for simple same-device, same-browser tracking. It fails in every other scenario. Safari's Intelligent Tracking Prevention (ITP) limits first-party cookies set by JavaScript to 7 days (or 24 hours in some cases). Firefox Enhanced Tracking Protection blocks known trackers by default. Chrome is deprecating third-party cookies entirely. Even without browser restrictions, cookies are device-specific: a user who visits on their laptop and their phone has two separate cookie identities. And cookies are browser-specific: a user who switches from Chrome to Safari on the same device appears as two people.

The Scope of the Problem

For a B2B SaaS company with 100,000 monthly unique visitors, the true number of unique people might be 60,000-70,000 after accounting for multi-device and multi-session duplication. That means 30-40% of your "unique visitors" are duplicates. Your top-of-funnel metrics are inflated, your conversion rates are deflated, and your attribution models are crediting the wrong touchpoints because they cannot see the full journey.

3.2
average devices
used per B2B buyer during purchase journey
30-40%
of unique visitors
are duplicates from multi-device browsing
7 days
Safari ITP limit
for JavaScript-set first-party cookies

Source: Salesforce Connected Consumer Report, WebKit ITP documentation

The Identity Resolution Architecture

A complete identity resolution system has four layers. Each layer adds a level of sophistication and accuracy to your user identification.

Identity Resolution Layers

1
Anonymous Tracking

Every visitor gets a persistent anonymous ID via first-party cookie. This ID tracks all activity within a single browser: page views, clicks, scroll depth, time on site. The anonymous profile accumulates behavioral data even before the user is known. Set the cookie server-side to bypass ITP restrictions and extend its lifespan from 7 days to the standard expiry.

2
The Identify Call

When a user provides identifying information (email, account ID, phone number), fire an identify call that links their anonymous ID to their known identity. This single call retroactively connects all prior anonymous activity to the known user. It is the most important instrumentation in your entire analytics implementation.

3
Identity Graph

Maintain a server-side graph that maps all identifiers (anonymous IDs, emails, account IDs, device fingerprints) to a single canonical user profile. When a new identifier is associated with a known user, merge the histories. The graph enables cross-device and cross-session identity resolution.

4
Enrichment and Segmentation

Once the user is identified, enrich their profile with firmographic data (company, industry, size), behavioral data (feature usage, engagement score), and lifecycle data (trial stage, plan type, renewal date). This enriched profile powers personalization, scoring, and advanced segmentation.

Implementing Anonymous Tracking

Anonymous tracking is the foundation. Done well, it gives you rich behavioral data on every visitor from the moment they arrive. Done poorly, it creates gaps in your data that cascade through every downstream analysis.

Generating the Anonymous ID

Generate a UUID (v4) on the user's first visit and store it in a first-party cookie. The cookie should have the following attributes: SameSite=Lax to prevent CSRF while allowing normal navigation, Secure to ensure transmission only over HTTPS, HttpOnly=false so your JavaScript analytics library can read it (though server-side setting is preferred), and a Max-Age of 13 months (the maximum useful duration given annual consent renewal).

Server-Side vs. Client-Side Cookie Setting

There is a critical difference between cookies set by JavaScript (document.cookie) and cookies set by the server (Set-Cookie header). Safari's ITP treats them differently: JavaScript-set cookies are limited to 7 days, while server-set cookies can persist for up to 13 months. If a significant portion of your traffic uses Safari (30-40% is typical for B2B), setting your anonymous ID cookie server-side is essential. This means your analytics tracking endpoint must set the cookie in the HTTP response header, not in the browser via JavaScript.

What to Track Anonymously

Anonymous visitors generate valuable behavioral signals. Track page views with full URL, referrer, and UTM parameters. Track content engagement: scroll depth (25%, 50%, 75%, 100%), time on page, and interactions with key elements (pricing toggles, feature tabs, demo request buttons). Track navigation patterns: the sequence of pages visited, entry and exit pages, and the number of sessions before conversion. Track technical context: device type, browser, operating system, viewport size, and geographic region (from IP, without storing the IP itself). All of this data sits in the anonymous profile, waiting to be merged into a known identity.

The First-Touch Preservation Problem
The most important anonymous data to preserve is the first touch: the original source, medium, campaign, and landing page that brought the user to your site for the very first time. If a user first arrives via an organic search result, then returns via a retargeting ad, then converts via a direct visit, the first touch (organic search) is the most difficult to preserve and the most valuable for attribution. Store first-touch data in both the anonymous profile and in a separate long-lived cookie so it survives across sessions. When the anonymous profile merges into a known identity, the first-touch data transfers with it.

The Identify Call: The Critical Moment

The identify call is the single most important event in your analytics implementation. It is the moment when an anonymous visitor becomes a known user, and everything you tracked anonymously becomes attributable to a real person. Getting this right means understanding when to fire it, what data to include, and how it triggers the merge.

When to Fire the Identify Call

Fire the identify call at every moment a user provides or confirms their identity. The most common triggers are: account creation (the user signs up), login (the user authenticates on any device), form submission (the user provides their email in a lead form, newsletter signup, or content download), OAuth callback (the user authenticates via Google, GitHub, or SSO), and email click-through (the user clicks a link in an email that contains their identifier in the URL). Each of these events represents an opportunity to link anonymous behavior to a known identity. Missing any of them creates gaps in your identity graph. The login event is particularly important because it fires on every session start for returning users, ensuring that cross-device activity is continuously merged.

What Data to Include

The identify call should include the user's canonical identifier (usually a user ID from your database) and a set of traits that describe the user. Required traits include email, creation date, and plan type. Recommended traits include company name, company size, industry, role/title, and lifecycle stage. Optional traits include phone number, timezone, preferred language, and referral source. The traits in the identify call become user properties in your analytics tool, enabling segmentation and filtering. Critically, include the anonymous ID in the identify call so the analytics platform can merge the anonymous history into the known profile.

Trigger EventUser ID SourceAnonymous MergePriority
Account creationDatabase user IDMerge all prior anonymous eventsCritical
LoginDatabase user IDMerge current session anonymous eventsCritical
Form submissionEmail (or matched user ID)Merge all prior anonymous eventsHigh
Email click-throughEmail token in URLMerge current device anonymous eventsHigh
OAuth/SSO callbackMatched or created user IDMerge all prior anonymous eventsCritical

The Alias vs. Identify Distinction

Some analytics platforms (including Kissmetrics, Mixpanel, and Segment) distinguish between identify and alias. The identify call sets traits on a user profile and associates the current session with a known user. The alias call permanently merges two user identities into one. You typically call alias once (when an anonymous user first becomes known) and identify on every subsequent session. Getting the sequence wrong can create orphaned profiles or duplicate users. The correct sequence is: (1) track anonymous events, (2) user signs up or logs in, (3) call alias to merge anonymous ID into known user ID, (4) call identify with user traits, (5) continue tracking events under the known user ID. On subsequent logins, skip alias and only call identify, because the user is already known.

Building the Identity Graph

The identity graph is the server-side data structure that maps all known identifiers to a single canonical user. It is what enables cross-device, cross-session, and cross-platform identity resolution. Without it, identity merges are limited to what your analytics tool supports natively.

Graph Structure

At its simplest, the identity graph is a table with three columns: identifier_type (anonymous_id, email, user_id, phone, device_id), identifier_value (the actual value), and canonical_user_id (the single user ID this identifier maps to). When a new identifier is associated with a known user, a row is added. When two previously separate users are discovered to be the same person (for example, when an email used in a form matches an existing user's email), the graph merges the two canonical IDs into one and updates all rows.

Deterministic vs. Probabilistic Matching

Deterministic matching uses explicit identifiers (email, user ID, phone number) to connect profiles. It is highly accurate but limited to moments when users provide identifying information. Probabilistic matching uses statistical signals (IP address, device fingerprint, browser characteristics, login timing patterns) to infer that two anonymous profiles likely belong to the same person. It has broader coverage but lower accuracy and raises additional privacy concerns.

For most B2B SaaS analytics, deterministic matching is sufficient and preferable. Your users log in to your product, submit forms, and click emails. These events provide enough identity signals to resolve the majority of cross-device journeys. Probabilistic matching adds marginal coverage at the cost of complexity and privacy risk. Save it for high-traffic consumer applications where the user-to-anonymous ratio is much lower.

Handling Merge Conflicts

Merge conflicts occur when two identifiers that should map to the same user are currently mapped to different canonical IDs. The most common scenario: a user signs up with their personal email, then later logs in with their work email via SSO. The identity graph now has two separate users who are actually one person. Resolving this requires a merge operation that combines both profiles into one canonical user, preserving the event history from both.

The merge decision rules matter. Merge when: the same email appears in two profiles, the same phone number appears in two profiles, or a user explicitly links two accounts. Do not merge when: the same IP address appears in two profiles (shared offices), the same device fingerprint appears in two profiles (shared devices), or similar behavioral patterns appear in two profiles (correlation is not identity). Overly aggressive merging creates composite profiles that represent multiple people, which is worse than having duplicate profiles.

The Shared Device Problem
In shared device environments (kiosks, family computers, shared work computers), multiple people use the same browser and the same cookie. If your identity system is not careful, it will merge the activity of different people into one profile. The safeguard is to require explicit authentication before merging: never merge anonymous activity into a known profile based solely on cookie matching. Require a login, form submission, or other authenticated action. If a user logs out, generate a new anonymous ID for the next session rather than continuing to use the previous user's profile.

Cross-Device Identity Resolution

Cross-device resolution is the most challenging aspect of user identification because there is no shared state between devices. A laptop cookie and a phone cookie have no connection to each other. The only link is the user themselves, specifically the moment they authenticate on each device.

The Login-Based Approach

The most reliable cross-device strategy is simple: when a user logs in on any device, fire the identify call with their canonical user ID. The analytics platform maps that device's anonymous ID to the known user, merging the anonymous activity into the user's profile. If a user browses your marketing site on their phone (anonymous), then logs in to your product on their laptop (identified), then later logs in on their phone (identified on the previously anonymous device), the phone's anonymous activity merges into their profile on the third interaction. The key requirement is that your product must prompt login on every device. For SaaS products with authenticated sessions, this happens naturally. For marketing sites that do not require authentication, the gap between anonymous and identified is wider and harder to close.

Email as the Cross-Device Bridge

For marketing sites where login is not applicable, email is the best cross-device identifier. When a user submits a form or clicks a link in an email, their email address connects that device's anonymous activity to their known profile. Design your email links to include an encrypted identifier token. When the user clicks through, the landing page reads the token and fires an identify call, merging that device's anonymous history into the known profile. This approach requires careful implementation to avoid security issues: tokens should be single-use, time-limited, and encrypted. Never include raw email addresses or user IDs in URLs.

The Coverage Gap

No identity system achieves 100% coverage. Some users never log in, never submit a form, and never click an email. They remain permanently anonymous. For B2B SaaS, the typical resolution rate (percentage of visitors eventually identified) ranges from 15-30% for marketing sites and 85-95% for product applications. The 70-85% of marketing visitors who remain anonymous still provide value through aggregate analysis: traffic patterns, content performance, and campaign attribution at the channel level rather than the user level. Accept the coverage gap rather than resorting to invasive tracking techniques that violate privacy norms.

Implementation Patterns by Analytics Platform

Each analytics platform implements identity resolution differently. Understanding the specific mechanics of your platform prevents implementation mistakes that are difficult to fix after the fact.

Kissmetrics

Kissmetrics was designed around identity from its inception. The identify method accepts any string as an identity (email is recommended). When called, it permanently aliases the current anonymous ID to the provided identity. All past and future events tracked under the anonymous ID are attributed to the identified user. Kissmetrics supports multiple aliases per user, so a user can be identified by email, user ID, and phone number, and all three resolve to the same profile. The platform automatically handles the merge when any known alias is detected.

Mixpanel

Mixpanel uses a two-step process: identify and alias. The identify method sets the distinct_id for future events. The alias method creates a permanent link between two distinct IDs. The critical rule in Mixpanel is that alias should only be called once per user, typically at account creation. Calling alias multiple times can create identity loops. For returning users who log in, call identify only. Mixpanel's simplified ID management (available since 2023) reduces this complexity by handling merges automatically, but understanding the underlying mechanics remains important for debugging identity issues.

Segment

Segment acts as an identity resolution layer between your application and your downstream analytics tools. Its identify call translates into the appropriate identify/alias calls for each connected destination. Segment's identity graph maintains a mapping of anonymous IDs to user IDs and handles the merge logic for each destination's specific requirements. This abstraction is valuable if you use multiple analytics tools because you implement identity resolution once in Segment and it propagates correctly to Mixpanel, Amplitude, Kissmetrics, and your data warehouse.

GA4

GA4's identity model is the weakest of the major analytics platforms. It supports three identity spaces: User-ID (your own identifier, set via gtag('set', {"user_id": "12345"})), Google Signals (cross-device matching for signed-in Google users), and Device-ID (cookie-based, the default). GA4 uses a blended identity model that prioritizes User-ID when available, falls back to Google Signals, and then to Device-ID. However, it does not retroactively merge anonymous history when a User-ID is set for the first time. Pre-identification anonymous activity remains attributed to the device, not the user. This is a fundamental limitation for journey analysis and attribution.

Privacy and Compliance Constraints

User identification operates within a tightening regulatory framework. GDPR, CCPA/CPRA, ePrivacy Directive, and emerging state-level privacy laws all impose constraints on how you collect, store, and use personal identifiers. Building your identity system with compliance as a foundational constraint, not a bolt-on, is both legally required and strategically sound.

Consent Before Identification

Under GDPR, you need a lawful basis for processing personal data. For analytics, this is typically either legitimate interest (for basic, necessary analytics) or consent (for behavioral tracking and profiling). In practice, this means you must obtain consent before setting non-essential cookies, before firing the identify call with personal data, and before enriching user profiles with third-party data. Your consent management platform (OneTrust, Cookiebot, or similar) must integrate with your analytics implementation so that tracking only activates after consent is granted. For users who do not consent, you can still collect aggregate, non-identifying analytics but cannot build individual user profiles.

Data Retention and Right to Deletion

GDPR's right to erasure means you must be able to delete all data associated with a specific user when requested. In the context of identity resolution, this means your identity graph must support complete user deletion: removing the canonical profile, all associated identifiers, and all event history linked to those identifiers. If your analytics platform and your identity graph and your data warehouse all have separate copies of user data, a deletion request must propagate to all three. Design for this from the start. Retroactively adding deletion capability to an identity system that was not designed for it is painful and error-prone.

Minimization and Purpose Limitation

Collect only the identifiers you actually need. If your identity system works on email and user ID, do not also collect phone numbers, device fingerprints, and IP addresses "just in case." Each additional identifier increases your compliance surface and your breach exposure. Document the purpose for each identifier you collect. Email is used for identity resolution and communication. User ID is used for cross-session tracking and product analytics. Document why you need each one and review the list annually. Remove identifiers that are no longer serving their documented purpose.

Common Implementation Mistakes

Identity implementation mistakes are uniquely costly because they corrupt historical data. A tracking event that misfires can be fixed and re-sent. An identity merge that connects the wrong profiles is difficult or impossible to undo. Avoid these common mistakes.

Identifying Before Authentication

Calling identify with a user ID before the user has actually authenticated can merge the wrong anonymous profile into the wrong user. This happens when developers pre-populate user IDs from cached data or URL parameters without verifying authentication state. The rule is: only fire identify after a confirmed authentication event (successful login response, valid session token, confirmed account creation). Never identify based on unverified inputs.

Not Resetting After Logout

When a user logs out, you must reset the analytics identity. If you do not, the next person who uses that browser will have their activity attributed to the previous user's profile. For shared devices, this creates wildly inaccurate user profiles. On logout, call your analytics platform's reset method (Mixpanel's mixpanel.reset(), Kissmetrics' _kmq.push(['clearIdentity'])) and generate a new anonymous ID. The new session should be tracked as anonymous until the next user authenticates.

Using Mutable Identifiers

Using a mutable value as the canonical user ID creates identity fragmentation over time. Email addresses change. Usernames change. Phone numbers change. If your identity graph is keyed on email and a user changes their email, they effectively become a new person in your analytics. The canonical user ID must be an immutable, system-generated identifier: a database primary key, a UUID, or a similarly permanent value. Use email as a secondary identifier for merging and lookup, but never as the canonical ID.

Calling Alias Multiple Times

On platforms that distinguish between alias and identify (Mixpanel, Segment), calling alias more than once per user can create circular references or merge unrelated profiles. Alias should be called exactly once, at the moment an anonymous user first becomes known (account creation). All subsequent sessions should use identify only. Implement a server-side check that prevents alias from being called for a user who has already been aliased.

Key Takeaways

  • 1Set anonymous ID cookies server-side to bypass Safari ITP's 7-day limit on JavaScript-set cookies.
  • 2Fire the identify call at every authentication event: signup, login, form submission, OAuth callback, and email click-through.
  • 3Use an immutable system-generated ID (database primary key) as the canonical user ID. Never use email or other mutable values.
  • 4Build a server-side identity graph that maps all identifiers to a single canonical user for cross-device resolution.
  • 5Prefer deterministic matching (explicit identifiers) over probabilistic matching (fingerprints, IP). It is more accurate and more compliant.
  • 6Reset the analytics identity on logout to prevent profile contamination on shared devices.
  • 7Design for GDPR deletion from the start. You must be able to remove all data associated with a user across all systems.
  • 8Accept the coverage gap. 15-30% resolution on marketing sites is normal. Do not resort to invasive tracking to close the gap.

Identity resolution that respects your users and your data

Implementation patterns for user identification, cross-device tracking, and privacy-compliant analytics architecture. Weekly.

User identification is not a feature you configure once and forget. It is an ongoing system that requires instrumentation at every authentication point, governance over merge rules, compliance with evolving privacy regulations, and continuous monitoring for accuracy. The payoff is equally continuous: every analysis you run, from funnel conversion to cohort retention to attribution modeling, is only as accurate as your identity resolution. A user who appears as three anonymous visitors in your funnel is a conversion rate calculation that is wrong by a factor of three. Getting identity right does not make your analytics impressive. It makes your analytics accurate. And accuracy is the foundation that everything else depends on.

Prove what's working and cut what isn't

Oscom connects GA4, Kissmetrics, and your CRM so you can tie every marketing activity to revenue in one dashboard.