Concepts
Simbee provides a set of composable primitives that your application assembles to build personalization, discovery, and analytics features. This guide explains each primitive independently, then shows how they combine.
Vocabulary
Vocabulary is your domain's language. It defines the taxonomy that Simbee uses to understand what your users care about. Vocabulary consists of two structures:
Tags
Tags are flat labels that describe atomic interests. They have no hierarchy — each tag is an independent concept. Users accumulate affinity toward tags through their behavior (signals). Tags are the finest-grained unit of preference in the system.
Examples by domain
- Creator platform: jazz, photography, digital-art, cooking
- Learning app: python, machine-learning, statistics, algebra
- Marketplace: vintage, handmade, electronics, organic
- Professional network: rust, distributed-systems, product-management
Topics
Topics are broader categories that group related tags. They provide a second level of aggregation — Simbee computes affinity at both the tag level and the topic level, giving you fine-grained and coarse-grained preference data simultaneously.
Examples by domain
- Creator platform: Music, Visual Arts, Lifestyle
- Learning app: Computer Science, Mathematics, Data Science
- Marketplace: Fashion, Home & Garden, Technology
Designing your vocabulary
Start small. You can add tags and topics at any time without breaking existing data — affinities are computed incrementally. A good starting vocabulary has 10–30 tags grouped under 3–8 topics. Avoid creating tags that are too specific (they won't accumulate enough signal) or too broad (they won't differentiate users).
Vocabulary is configured via POST /api/v1/config/signal_types (for signal type taxonomy) and through the tag/topic affinities that signals create. See the Getting Started guide for a hands-on walkthrough.
Signals
Signals are behavioral events that express user intent. Every meaningful user action — a like, a purchase, a page view, a course completion — is captured as a signal. Signals are the raw input to the affinity computation pipeline.
Anatomy of a signal
Each signal connects a user to a target (another user, a piece of content, or a vocabulary item) with a type and strength:
| Field | Description |
|---|---|
signal_type_id | The kind of behavior (e.g. "like", "purchase", "view") |
target_id | What the user interacted with |
target_type | The kind of target ("user", "content", "tag", etc.) |
strength | Numeric weight (0.0–1.0). Defaults based on signal type config. |
Signal types
Signal types let you define what behaviors mean in your domain. Each type has a default strength and a decay strategy that controls how the signal's influence fades over time.
The same primitive, different meanings
- Social app: "like" (0.3), "follow" (0.7), "message" (0.9)
- Marketplace: "view" (0.1), "save" (0.4), "purchase" (1.0)
- Learning: "enroll" (0.5), "complete_lesson" (0.7), "earn_certificate" (1.0)
Batch ingestion
For high-volume applications, use POST /api/v1/signal_batches to submit up to 1,000 signals per call. Batch signals are processed asynchronously — the API returns immediately with a batch ID that you can poll for completion status.
Affinities
Affinities are computed relationship strengths between users and vocabulary items. They are the output of the signal processing pipeline — Simbee aggregates a user's signals, applies decay and normalization, and produces numeric affinity scores for each tag and topic the user has interacted with.
Tag affinities vs. topic affinities
Tag affinities are fine-grained: "this user has 0.82 affinity toward jazz." Topic affinities are aggregated: "this user has 0.71 affinity toward Music." Both are available via the API. Use tag affinities for precise recommendations and topic affinities for broad categorization.
Affinity summary
The GET /api/v1/users/{external_id}/affinity/summary endpoint returns a user's complete affinity profile — all tag and topic scores in a single response. This is the primary input for scoring and feed ranking.
Explicit vs. computed
Signals are explicit — the user did something. Affinities are computed — Simbee inferred a preference. This distinction matters for consent and transparency: you can always explain why a user has a particular affinity by tracing it back to their signals.
Scoring
Scoring determines how Simbee ranks users relative to each other. When a user requests a feed or match results, Simbee evaluates every candidate against a scoring formula and returns them in ranked order.
Scoring presets
Presets are named scoring configurations that weight different factors. Simbee ships with built-in presets, and you can create custom ones. The same user pool produces different rankings with different presets — this is how you build different experiences without changing your data.
Preset examples
- affinity_match: Weight shared interests heavily. Good for "people like you" features.
- diversity: Penalize too-similar results. Good for discovery and exploration.
- recency: Boost recently active users. Good for real-time feeds.
- engagement: Weight signal volume and frequency. Good for "popular" rankings.
Custom scoring
Use GET /api/v1/config/scoring/schema to see all available scoring dimensions and their valid ranges, then POST /api/v1/config/scoring to save a custom configuration. Your custom config is used by default for all feed and match requests, or you can override per-request.
Clustering
Clustering automatically groups users into segments based on their affinity profiles. Simbee uses HDBSCAN (a density-based clustering algorithm) to discover natural groups without requiring you to specify the number of clusters in advance.
How clusters form
The clustering pipeline runs periodically (triggered via API or on a schedule). It takes each user's affinity vector, reduces its dimensionality with SVD, and groups users whose reduced vectors are close together. Each cluster gets a label derived from the dominant affinities of its members.
Using clusters
Clusters are useful for:
- Segmentation analytics: Understand your user base as groups, not individuals.
- Campaign targeting: Target campaigns at specific clusters.
- Group formation: Use cluster membership to create cohorts, teams, or communities.
- Anomaly detection: Users who don't fit any cluster may need attention.
Each user has a cluster_id and cluster_confidence on their profile. List clusters via GET /api/v1/clusters and view members via GET /api/v1/clusters/{id}/members.
Consent layers
Consent layers scope what data is visible and matchable. They let you build different interaction tiers where users explicitly opt into levels of discoverability.
How they work
A consent layer is a named boundary. Users grant consent to specific layers, and only users who share a consent layer can be matched or see each other's data in that context. Affinities, scoring, and feed results are all filtered by consent layer when one is specified.
When you need consent layers
- Public profile + private matching: Users are visible publicly but only matchable if they opt into a "matching" layer.
- Tiered access: A "basic" layer for browsing and a "premium" layer for direct messages.
- Feature-gated discovery: Different parts of your app use different consent contexts.
If your application doesn't need consent scoping, you don't need to configure consent layers — all data is visible by default. Consent layers are configured via POST /api/v1/config/consent_layers and user consent is managed via POST /api/v1/clients/{client_id}/users/{user_id}/consents.
Campaigns
Campaigns deliver targeted content to users based on criteria you define. A campaign has a budget, targeting rules, and a set of content items. Simbee manages impression tracking, budget enforcement, and per-user frequency capping.
Campaign lifecycle
- Create — Define name, budget, targeting criteria, and optional date range.
- Add items — Attach content items that will be shown to targeted users.
- Activate — Campaign items begin appearing in users' feeds.
- Monitor — Track spend, impressions, and engagement via analytics.
- Pause / complete — Manually pause or let budget exhaustion complete the campaign.
Beyond promotion
Campaigns aren't just for advertising. Use them for onboarding sequences (show tutorial content to new users), A/B testing (serve different content to different segments), content distribution (ensure editorial picks reach the right audience), or seasonal events (time-bounded content with automatic expiry).
Feed
The feed is a personalized, ranked stream of content and users tailored to each user's affinity profile. It combines scored results with active campaign items and applies consent layer filtering.
Ranked feed
GET /api/v1/users/{external_id}/feed/ranked returns a cursor-paginated feed ranked by the active scoring configuration. Each page returns a set of results with a cursor for the next page.
Feed vs. matches
The feed is a general-purpose discovery endpoint — it surfaces content and users relevant to the viewer. Matches ( GET /api/v1/users/{external_id}/matches ) are consent-layer-scoped, bidirectional compatibility rankings. Use feeds for browsing and discovery. Use matches when both parties must opt in.
Analytics
Simbee aggregates behavioral data into queryable analytics endpoints. Use them to build dashboards, monitor engagement, and understand how your users interact with your platform.
Available insights
| Endpoint | What it shows |
|---|---|
/analytics/overview | Totals and last-24h activity (users, signals, matches, impressions) |
/analytics/signals | Signal volume by type, time period, and trend |
/analytics/affinities | Affinity distribution and coverage across your user base |
/analytics/clustering | Cluster sizes, distribution, and assignment coverage |
/analytics/campaigns | Campaign performance: impressions, spend, engagement rates |
/analytics/vocabulary | Tag and topic usage across signals and affinities |
/analytics/growth | User and signal growth over time |
All analytics endpoints are scoped to your tenant automatically via the JWT. Time-series data uses hourly and daily rollups for efficient querying.
Composition patterns
The primitives above are independent building blocks. The value of Simbee comes from composing them. Here are common patterns that show how the same primitives serve different applications.
Recommendation engine
Vocabulary + Signals + Affinities + Feed
Define your content taxonomy as vocabulary. Record user interactions as signals. Let affinities build preference profiles. Serve the ranked feed as personalized recommendations. No matching or consent layers needed.
Community matching
Vocabulary + Signals + Affinities + Consent Layers + Scoring + Matches
Users declare interests (signals against tags). Consent layers control who is discoverable for matching. Scoring determines compatibility. Match results connect compatible users who have both opted in.
Marketplace discovery
Scoring + Campaigns + Clustering + Feed + Analytics
Score sellers by relevance to each buyer. Use campaigns for promoted listings with budget caps. Cluster buyers into segments for targeted promotions. Feed surfaces the best results. Analytics measures conversion.
User segmentation
Vocabulary + Clustering + Analytics
Use Simbee purely for behavioral analytics. Ingest signals, let clustering discover natural user segments, and query analytics for segment-level insights. No feed, matching, or campaigns needed.
Content distribution
Campaigns + Feed + Signals + Analytics
Create campaigns for editorial picks or sponsored content. Items appear in user feeds based on targeting criteria. Track engagement via signals. Measure reach and effectiveness through analytics.
Engagement analytics
Signals + Analytics + Webhooks
Use Simbee as a behavioral event pipeline. Ingest all user actions as signals. Query aggregate analytics for dashboards. Subscribe to webhooks for real-time alerting on engagement patterns. No personalization needed.
Next steps
- Getting Started — Hands-on tutorial that walks through each primitive with real API calls.
- Authentication — Set up API keys and JWT tokens.
- API Reference — Full endpoint reference with request/response schemas.