Skip to main content

Chroma

This page guides you through the process of setting up the Chroma destination connector.

Features

Feature Supported?(Yes/No) Notes
Full Refresh Sync Yes
Incremental - Append Sync Yes
Incremental - Append + Deduped Yes

Output Schema

Only one stream will exist to collect data from all source streams. This will be in a collection in Chroma whose name will be defined by the user, and validated and corrected by Airbyte.

For each record, a UUID string is generated and used as the document id. The embeddings generated as defined will be stored as embeddings. Data in the text fields will be stored as documents and those in the metadata fields will be stored as metadata.

Requirements

To use the Chroma destination, you'll need:

  • An account with API access for OpenAI, Cohere (depending on which embedding method you want to use) or neither (if you want to use the default chroma embedding function)
  • A Chroma db instance (client/server mode or persistent mode)
  • Credentials (for cient/server mode)
  • Local File path (for Persistent mode)

Configure Network Access

Make sure your Chroma database can be accessed by HeroPixel If your database is within a VPC, you may need to allow access from HHeroPixelPs

Setup the Chroma Destination in Airbyte

You should now have all the requirements needed to configure Chroma as a destination in the UI. You'll need the following information to configure the Chroma destination:

  • (Required) Text fields to embed
  • (Optional) Text splitter Options around configuring the chunking process provided by the Langchain Python library.
  • (Required) Fields to store as metadata
  • (Required) Collection The name of the collection in Chroma db to store your data
  • (Required) Authentication method
    • For client/server mode
      • Host for example localhost
      • Port for example 8000
      • Username (Optional)
      • Password (Optional)
    • For persistent mode
      • Path The path to the local database file. Note that path must be an absolute path, prefixed with /local.
  • (Optional) Embedding
    • OpenAI API key if using OpenAI for embedding
    • Cohere API key if using Cohere for embedding
    • Embedding Field name and Embedding dimensions if getting the embeddings from stream records