PGVector Integration – Effortlessly Sync Apify Data to PostgreSQL
2 min read
Intro:
The PGVector Integration from Apify.com allows users to transfer and store selected data from Apify Actors directly into a PostgreSQL database with the PGVector extension. This tool is especially useful for those looking to improve their data retrieval processes by storing text data as vectors.
🔍 What Is PGVector Integration?
PGVector Integration is an API that enables the transfer of data from Apify Actors to PostgreSQL databases. This API provides functionalities to process data, compute text embeddings, and incrementally update records. It is particularly beneficial for developers and data scientists who want to implement search and retrieval augmented generation (RAG) in their applications, allowing for more efficient data operations in machine learning contexts.
✨ Features
- Direct Data Transfer: Seamlessly transfer data from Apify Actors to PostgreSQL.
- Text Embeddings: Compute embeddings for text data using providers like OpenAI or Cohere.
- Incremental Updates: Update only changed data to minimize processing and storage costs.
- Chunk Processing: Optionally split large text data into manageable chunks before storage.
- Flexible Configuration: Users can easily customize settings for their specific needs, including update strategies.
🛠️ How to Use It
Step-by-step tutorial:
- Go to the tool’s page: PGVector Integration
- Click “Try for free” or “Run actor”.
- Fill in the required input fields:
postgresSqlConnectionStr
: Connection string for your PostgreSQL database.postgresCollectionName
: Name of the collection in PostgreSQL where data will be stored.embeddingsApiKey
: API key for the embeddings provider (e.g., OpenAI).datasetFields
: Specify the fields to be stored.- Optional parameters such as
chunkSize
,chunkOverlap
, and update strategies can also be configured.
- Click “Run” and wait for results.
- Download results or send to a webhook if necessary.
🧪 Sample Input (JSON)
json { "postgresSqlConnectionStr": "postgresql://postgres:password@localhost:5432/apify", "postgresCollectionName": "apify-collection", "embeddingsApiKey": "YOUR-OPENAI-API-KEY", "datasetFields": ["text"], "dataUpdatesStrategy": "deltaUpdates", "dataUpdatePrimaryDatasetFields": ["url"], "expiredObjectDeletionPeriodDays": 30, "performChunking": true, "chunkSize": 2000, "chunkOverlap": 200 }
📤 Output Data (Fields)
- Stored vectors and embeddings associated with the input dataset.
- Metadata for each entry, including timestamps and identifiers.
💰 Pricing This actor is priced at $0.05 per request. Apify also offers a free tier for users to test the functionality.
👨💻 Built By The PGVector Integration is developed by the team at Apify.com.
✅ Final Thoughts This integration is perfect for developers and data scientists looking to enhance their data management capabilities with advanced text processing. Its ability to perform incremental updates and manage large datasets efficiently makes it a must-try for anyone working with large volumes of textual data or looking to optimize their PostgreSQL databases.
🔗 Try the Actor Now 👉 PGVector Integration