Privacy and Anonymity

The platform prioritizes contributor privacy. Participants are not required to reveal real-world identity. Contributors are assigned anonymized worker codes to prevent identity linkage across campaigns. Current implementation provides basic anonymization with plans for enhanced privacy features.

Privacy layers diagram

Pseudonymous participation

(no real-world identity required)

Eligibility proofs (optional)

ZK Credential checks for cohort access

Content privacy

encrypted tasks + encrypted contributions

Delivery privacy

private enterprise delivery / access control

Design Principles

The platform follows core privacy principles:

Sensitive content is stored securely. Prompts, attachments, and outputs are managed with access controls.
Participation doesn't require real-world identity. Eligibility is enforced through declared routing rules such as reputation, Account Trust/runtime policy, and cohort proofs.
Metadata collection is minimized for privacy protection.

It is crucial to treat it carefully due to potential privacy leaks, so campaigns explicitly define which metadata hooks are enabled.

Privacy Modes

Campaigns choose a privacy mode in the configuration. It determines how tasks are distributed, what is visible to participants, what metadata is collected, and how outputs are delivered.

Standard Mode

Suitable for non-sensitive studies and general evaluation tasks. The characteristics are:

Task prompts and context are delivered off-chain to contributors and validators.
Outputs are stored off-chain and delivered through the Distribution layer.
The protocol records on-chain only campaign commitments, outcomes, and settlement receipts.
Minimal metadata hooks are enabled to support anti-fraud measures.

Standard mode doesn’t require identity disclosure, but it may allow a broader catalogue of campaigns and higher transparency for aggregated results.

Protocol Roadmap: The sections below describe the full privacy design planned for the Lioth protocol. Confidential modes, watermarking, confidentiality bonds, private cohort proofs, encrypted storage with PSOs, key custody modes, delivery receipts, and on-chain licensing references are not yet active in the current platform. They are documented here to define the target protocol architecture.

Confidential Mode

Confidential mode is for campaigns where prompts, context, or outputs contain sensitive information (enterprise IP, unreleased products, internal policies). It is designed to reduce leakage risk. It can’t guarantee that a human will never leak anything, but the goal is to make leakage unlikely, limited in impact, and enforceable.

Confidential mode protects the prompt and attached materials, the identity of the customer or enterprise (when needed), the outputs and deliverables, and the linkability of participants across campaigns. The core idea is to reduce exposure, restrict access and enforce consequences through many mechanisms:

Minimize what is shown: redact names, remove identifiers, and splitting work into smaller pieces.
Restrict who can access the work: confidential tasks can be limited to vetted cohorts, using declared eligibility rules, reputation, Account Trust/runtime policy, and cohort proofs. This increases accountability.
Make leaks traceable when possible: campaigns can include per-participant watermarking or canary tokens so that leaked text can be linked back to the source.
Make leaks costly: high-sensitivity campaigns can require a confidentiality bond. If a leak is proven through arbitration, the bond can be slashed.

Confidential Tiers

The protocol supports three confidentiality tiers, so requesters can choose the right trade-off between scale and risk:

Tier 1 (Confidential Public): it is used when the task can be safely crowdsourced after redaction. Typical controls may include redaction, abstraction and task splitting.
Tier 2 (Confidential Vetted): it is used when contributors must see sensitive context. Some controls may include reputation gating, cohort restriction, watermarket, and optional bond.
Tier 3 (Enterprise Restricted): it is used when leakage risk must be extremely low. Typical controls include enterprise-approved cohort or enterprise-hosted task environments, with Lioth providing verification receipts and settlement.

Recommended Campaign Fields

privacy_tier: public_redacted, vetted, enterprise_restricted
allowed_context: redacted, full, enterprise_hosted cohort_requirements: eligibility rules, trust/readiness policy, and cohort proofs
watermarking: on, off dispute_rules: dispute window + evidence requirements + arbitration quorum

Private Cohort Mode

This mode adds cohort gating to either standard or confidential mode. In private cohort mode:

Contributors must present an eligibility proof to receive tasks, and these can encode attributes such as region, language proficiency, professional background, or prior screening completion.
Proofs are designed to avoid revealing identity. The requester learns that the participant satisfies the cohort condition, not who they are.

By this way, it supports workflows like screening funnels, longitudinal studies, and targeted recruitment while preserving participant anonymity.

Data minimization and metadata hooks

Metadata hooks are optional and campaign-based. They exist to support verification quality and fraud detection, but they are constrained because metadata can be identifying. The protocol defines a minimal default metadata set:

Completion time bucket Task spec version
Client version Task variant identifier
Optional cohort proof reference

Additional telemetry may be enabled only when a campaign explicitly opts in and privacy mode allows it. It may include copy-paste indicators, interaction event counts, or keystroke timing statistics. These signals should be used as risk indicators for audit triggers, not as standalone proofs. If enabled, they should be processed and stored off-chain.

Proof of Eligibility without Identity Disclosure

Campaigns often need targeted cohorts. The protocol supports this using privacy-preserving proofs rather than identity disclosure. A requester can specify cohort requirements such as:

Passed a screening task
Language proficiency tier
Domain expertise credential
Geographic region
Longitudinal continuity

Participants can satisfy their requirements by presenting proofs. The proof reveals only the required statement, not personal information. The protocol can verify the proof and route the task accordingly. This mechanism supports screening funnels and recontact workflows while reducing the incentive for identity farming and data harvesting.

Storage, Encryption and Delivery

All task content and outputs are stored off-chain. The protocol treats storage and delivery as part of the Distribution Layer and requires confidentiality as a mandatory feature. Data objects (off-chain, content-addressed) The Distribution Layer operates on a small set of standardized objects:

Task Payload: the encrypted submission payload produced by a contributor (raw output + required fields).
Validated Output: the accepted/rejected result reference + any allowed structured fields for delivery.
Bundle: one or more outputs packaged into the requester’s required format (JSON, CSV, Parquet, report, file set, etc.).
Manifest: a machine-readable index of a bundle (object IDs, schema/version, counts, hashes, size, timestamps).
Delivery Receipt: a signed event proving that a specific manifest/bundle was prepared and delivered under a specific campaign configuration hash (without revealing contents).

Encryption Requirements (At Rest + in Transit)

At minimum:

Encrypted at rest: all payloads, bundles, datasets and manifests are stored encrypted.
End-to-end encryption to requesters: delivery uses an encryption envelope so only entitled requesters (and authorized organizational keys, if enabled) can access to data.
Key separation: encryption keys for content are distinct from identity/reputation keys and are not committed on-chain.
Campaigns may require replicated encrypted storage across multiple PSOs (tier-defined replication factor) to reduce availability risk.
Delivery integrity may be verified via bundle/manifest hash commitments and signed delivery receipts, independent of which PSO served delivery.

Key Custody Modes (Campaign Parameter)

Each campaign declares a key custody mode inside its Delivery Configuration:

Requester-managed keys (default): requesters can provide or generate a public key (or org set key) to encrypt delivery envelopes.
The protocol never needs requester private keys.
PSO-managed ephemeral keys (optional): a Delivery PSO generates per-bundle encryption keys and encrypts them to requester keys; PSO never receives requester secrets and cannot decrypt after envelope creation.
Enterprise-hosted environment (confidential/enterprise tier): delivery, storage, and decryption occur inside an enterprise-controlled environment. The protocol provides verification receipts and settlement only. It is recommended when policy requires data residency or internal-only handling.

Entitlements and Access Control

Access control is enforced through entitlements linked to campaign mode and licensing:

Direct work mode: entitlements typically belong to the requester identity (and optionally a requester organizational group).
Dataset mode: entitlements reflect licensing constraints (public/private, permitted usage, resale rules). Entitlement checks happen off-chain, while licensing references for dataset campaigns can be recorded on-chain.

Delivery Modes (Campaign Parameter)

Delivery configuration specifies one or more supported delivery paths:

Private delivery (default): requester-only encrypted delivery (download/API/webhook).
Subscription feed: incremental bundle updates over time (dataset updates, continuous labeling streams).
Public distribution (opt-in): listing of dataset artifacts with licensing references (content still off-chain).

Delivery Receipts and Integrity Commitments

For each delivered bundle or dataset artifact, the Distribution Layer produces a Delivery Receipt (signed service event) containing:

Campaign identifier + campaign configuration hash
Bundle/manifest identifier(s) and bundle/manifest hash Delivery mode and endpoint type (without revealing secrets)
Entitlement reference (e.g. requester ID, license ID reference)
Timestamp(s) and PSO signature

A commitment to the Delivery Receipt may be recorded on-chain. This enables independent verification that a specific bundle hash was delivered under a specific configuration, without publishing sensitive content.

Retention, Deletion, and Audit Logs (Off-Chain)

Each campaign declares retention and logging parameters:

Retention window: how long encrypted payloads/bundles/manifests are kept by Distribution PSOs.
Deletion semantics: PSOs can delete stored encrypted objects after retention. The protocol can’t guarantee deletion of copies downloaded by requesters.
Audit logs: delivery logs are available off-chain for enterprise audit needs, including which delivery receipts were issued and when.

On-Chain Records

The protocol records only the information required to provide settlement finality and independent auditability. On-chain records include:

Campaign identifier and configuration commitment hash.
Escrow funding commitment and settlement receipts.
Task outcomes (accepted, rejected, audited) as state transitions.
Validator quorum summary and finality proofs.
Reputation updates and slashing events, when applicable.
Dataset source of origin proofs and licensing references for dataset campaigns.
Licensing commitments (dataset campaigns): a Dataset Licensing Record commitment reference, binding dataset_artifact_hash ↔ license_terms_hash, plus issuer signature(s). This enables third parties to verify which licensing terms governed distribution without revealing the full license document on-chain.

Off-Chain Storage

Off-chain data includes:

Full task prompts, attachments, and external materials.
Raw outputs and deliverable payloads.
Detailed validator rationales and review materials.
Fraud-signal computation artifacts such as embeddings and similarity indices.
Dataset artifacts and QA report details.
Private delivery endpoints and access control metadata.

Threats and Privacy Risk Boundaries

The protocol treats as privacy risks these actions:

Collecting direct identifiers (names, phone numbers).
Storing raw outputs on-chain.
Collecting high-resolution telemetry by default.
Cross-campaign linkability through shared metadata.

To reduce these risks, the protocol requires:

Campaign-level declaration of privacy mode and metadata hooks.
Default minimization of telemetry.
Confidentiality-safe delivery pipelines.
Separation between on-chain commitments and off-chain content.