Concepts

Simbee provides a set of composable primitives that your application assembles to build personalization, discovery, and analytics features. This guide explains each primitive independently, then shows how they combine.

Vocabulary

Vocabulary is your domain's language. It defines the taxonomy that Simbee uses to understand what your users care about. Vocabulary consists of two structures:

Tags

Tags are flat labels that describe atomic interests. They have no hierarchy — each tag is an independent concept. Users accumulate affinity toward tags through their behavior (signals). Tags are the finest-grained unit of preference in the system.

Examples by domain

  • Creator platform: jazz, photography, digital-art, cooking
  • Learning app: python, machine-learning, statistics, algebra
  • Marketplace: vintage, handmade, electronics, organic
  • Professional network: rust, distributed-systems, product-management

Topics

Topics are broader categories that group related tags. They provide a second level of aggregation — Simbee computes affinity at both the tag level and the topic level, giving you fine-grained and coarse-grained preference data simultaneously.

Examples by domain

  • Creator platform: Music, Visual Arts, Lifestyle
  • Learning app: Computer Science, Mathematics, Data Science
  • Marketplace: Fashion, Home & Garden, Technology

Designing your vocabulary

Start small. You can add tags and topics at any time without breaking existing data — affinities are computed incrementally. A good starting vocabulary has 10–30 tags grouped under 3–8 topics. Avoid creating tags that are too specific (they won't accumulate enough signal) or too broad (they won't differentiate users).

Vocabulary is configured via POST /api/v1/config/signal_types (for signal type taxonomy) and through the tag/topic affinities that signals create. See the Getting Started guide for a hands-on walkthrough.

Signals

Signals are behavioral events that express user intent. Every meaningful user action — a like, a purchase, a page view, a course completion — is captured as a signal. Signals are the raw input to the affinity computation pipeline.

Anatomy of a signal

Each signal connects a user to a target (another user, a piece of content, or a vocabulary item) with a type and strength:

FieldDescription
signal_type_idThe kind of behavior (e.g. "like", "purchase", "view")
target_idWhat the user interacted with
target_typeThe kind of target ("user", "content", "tag", etc.)
strengthNumeric weight (0.0–1.0). Defaults based on signal type config.

Signal types

Signal types let you define what behaviors mean in your domain. Each type has a default strength and a decay strategy that controls how the signal's influence fades over time.

The same primitive, different meanings

  • Social app: "like" (0.3), "follow" (0.7), "message" (0.9)
  • Marketplace: "view" (0.1), "save" (0.4), "purchase" (1.0)
  • Learning: "enroll" (0.5), "complete_lesson" (0.7), "earn_certificate" (1.0)

Batch ingestion

For high-volume applications, use POST /api/v1/signal_batches to submit up to 1,000 signals per call. Batch signals are processed asynchronously — the API returns immediately with a batch ID that you can poll for completion status.

Affinities

Affinities are computed relationship strengths between users and vocabulary items. They are the output of the signal processing pipeline — Simbee aggregates a user's signals, applies decay and normalization, and produces numeric affinity scores for each tag and topic the user has interacted with.

Tag affinities vs. topic affinities

Tag affinities are fine-grained: "this user has 0.82 affinity toward jazz." Topic affinities are aggregated: "this user has 0.71 affinity toward Music." Both are available via the API. Use tag affinities for precise recommendations and topic affinities for broad categorization.

Affinity summary

The GET /api/v1/users/{external_id}/affinity/summary endpoint returns a user's complete affinity profile — all tag and topic scores in a single response. This is the primary input for scoring and feed ranking.

Explicit vs. computed

Signals are explicit — the user did something. Affinities are computed — Simbee inferred a preference. This distinction matters for consent and transparency: you can always explain why a user has a particular affinity by tracing it back to their signals.

Scoring

Scoring determines how Simbee ranks users relative to each other. When a user requests a feed or match results, Simbee evaluates every candidate against a scoring formula and returns them in ranked order.

Scoring presets

Presets are named scoring configurations that weight different factors. Simbee ships with built-in presets, and you can create custom ones. The same user pool produces different rankings with different presets — this is how you build different experiences without changing your data.

Preset examples

  • affinity_match: Weight shared interests heavily. Good for "people like you" features.
  • diversity: Penalize too-similar results. Good for discovery and exploration.
  • recency: Boost recently active users. Good for real-time feeds.
  • engagement: Weight signal volume and frequency. Good for "popular" rankings.

Custom scoring

Use GET /api/v1/config/scoring/schema to see all available scoring dimensions and their valid ranges, then POST /api/v1/config/scoring to save a custom configuration. Your custom config is used by default for all feed and match requests, or you can override per-request.

Clustering

Clustering automatically groups users into segments based on their affinity profiles. Simbee uses HDBSCAN (a density-based clustering algorithm) to discover natural groups without requiring you to specify the number of clusters in advance.

How clusters form

The clustering pipeline runs periodically (triggered via API or on a schedule). It takes each user's affinity vector, reduces its dimensionality with SVD, and groups users whose reduced vectors are close together. Each cluster gets a label derived from the dominant affinities of its members.

Using clusters

Clusters are useful for:

  • Segmentation analytics: Understand your user base as groups, not individuals.
  • Campaign targeting: Target campaigns at specific clusters.
  • Group formation: Use cluster membership to create cohorts, teams, or communities.
  • Anomaly detection: Users who don't fit any cluster may need attention.

Each user has a cluster_id and cluster_confidence on their profile. List clusters via GET /api/v1/clusters and view members via GET /api/v1/clusters/{id}/members.

Campaigns

Campaigns deliver targeted content to users based on criteria you define. A campaign has a budget, targeting rules, and a set of content items. Simbee manages impression tracking, budget enforcement, and per-user frequency capping.

Campaign lifecycle

  1. Create — Define name, budget, targeting criteria, and optional date range.
  2. Add items — Attach content items that will be shown to targeted users.
  3. Activate — Campaign items begin appearing in users' feeds.
  4. Monitor — Track spend, impressions, and engagement via analytics.
  5. Pause / complete — Manually pause or let budget exhaustion complete the campaign.

Beyond promotion

Campaigns aren't just for advertising. Use them for onboarding sequences (show tutorial content to new users), A/B testing (serve different content to different segments), content distribution (ensure editorial picks reach the right audience), or seasonal events (time-bounded content with automatic expiry).

Feed

The feed is a personalized, ranked stream of content and users tailored to each user's affinity profile. It combines scored results with active campaign items and applies consent layer filtering.

Ranked feed

GET /api/v1/users/{external_id}/feed/ranked returns a cursor-paginated feed ranked by the active scoring configuration. Each page returns a set of results with a cursor for the next page.

Feed vs. matches

The feed is a general-purpose discovery endpoint — it surfaces content and users relevant to the viewer. Matches ( GET /api/v1/users/{external_id}/matches ) are consent-layer-scoped, bidirectional compatibility rankings. Use feeds for browsing and discovery. Use matches when both parties must opt in.

Analytics

Simbee aggregates behavioral data into queryable analytics endpoints. Use them to build dashboards, monitor engagement, and understand how your users interact with your platform.

Available insights

EndpointWhat it shows
/analytics/overviewTotals and last-24h activity (users, signals, matches, impressions)
/analytics/signalsSignal volume by type, time period, and trend
/analytics/affinitiesAffinity distribution and coverage across your user base
/analytics/clusteringCluster sizes, distribution, and assignment coverage
/analytics/campaignsCampaign performance: impressions, spend, engagement rates
/analytics/vocabularyTag and topic usage across signals and affinities
/analytics/growthUser and signal growth over time

All analytics endpoints are scoped to your tenant automatically via the JWT. Time-series data uses hourly and daily rollups for efficient querying.

Composition patterns

The primitives above are independent building blocks. The value of Simbee comes from composing them. Here are common patterns that show how the same primitives serve different applications.

Recommendation engine

Vocabulary + Signals + Affinities + Feed

Define your content taxonomy as vocabulary. Record user interactions as signals. Let affinities build preference profiles. Serve the ranked feed as personalized recommendations. No matching or consent layers needed.

Community matching

Vocabulary + Signals + Affinities + Consent Layers + Scoring + Matches

Users declare interests (signals against tags). Consent layers control who is discoverable for matching. Scoring determines compatibility. Match results connect compatible users who have both opted in.

Marketplace discovery

Scoring + Campaigns + Clustering + Feed + Analytics

Score sellers by relevance to each buyer. Use campaigns for promoted listings with budget caps. Cluster buyers into segments for targeted promotions. Feed surfaces the best results. Analytics measures conversion.

User segmentation

Vocabulary + Clustering + Analytics

Use Simbee purely for behavioral analytics. Ingest signals, let clustering discover natural user segments, and query analytics for segment-level insights. No feed, matching, or campaigns needed.

Content distribution

Campaigns + Feed + Signals + Analytics

Create campaigns for editorial picks or sponsored content. Items appear in user feeds based on targeting criteria. Track engagement via signals. Measure reach and effectiveness through analytics.

Engagement analytics

Signals + Analytics + Webhooks

Use Simbee as a behavioral event pipeline. Ingest all user actions as signals. Query aggregate analytics for dashboards. Subscribe to webhooks for real-time alerting on engagement patterns. No personalization needed.

Key takeaway:Matching is one composition among many, not the default use case. Most applications use a subset of Simbee's primitives. Start with the smallest set that solves your problem and add more as your product evolves.

Next steps

  • Getting Started — Hands-on tutorial that walks through each primitive with real API calls.
  • Authentication — Set up API keys and JWT tokens.
  • API Reference — Full endpoint reference with request/response schemas.