Today, we're excited to announce that Pinecone Assistant is generally available (GA) for all users. Developers of all skill levels have already created thousands of their own knowledgeable AI assistants across diverse use cases with Pinecone Assistant (e.g., financial analysis, legal discovery, and compliance assistants). Now, we’ve made it even easier to upload your documents, ask questions, and receive accurate, grounded responses. Increase time to value by creating and deploying production-grade solutions in minutes, knowing under the hood your assistants are powered by the same safeguards and benefits as our fully managed vector database.
Tl;dr: What’s new
With GA, Pinecone Assistant now includes:
- Optimized interfaces with new chat and context APIs powering chat and agent-based applications
- Custom instructions to tailor your assistant’s behavior and responses to specific use cases or requirements
- New input and output formats with added support for JSON, .md, and .docx files in addition to PDF and .txt
- Region control with options to build in the EU or US
Unlock immediate value for your team – just bring your data
Pinecone Assistant is an API service built to power grounded chat and agent-based applications with precision and ease. Abstracting away the many systems and steps required to build Retrieval Augmented Generation (RAG)-powered applications (e.g., chunking, embedding, file storage, query planning, vector search, model orchestration, reranking, and more), Assistant accelerates RAG development, enabling you to launch knowledgeable production-grade applications in under 30 minutes, regardless of experience.
"Pinecone Assistant has become essential to our generative AI projects, accelerating the time between idea and implementation by 70%. It simplifies complex tasks like document chunking, embedding, and retrieval, letting us focus on outcomes, cut maintenance and scaling costs by 30%, and quickly demonstrate real results to clients." - Mark Kashef, CEO, Prompt Advisers
The underlying serverless architecture, intuitive interface, and built-in evaluation and benchmarking framework make it easy to get started (just upload your raw files via a simple API), quick to experiment and iterate, and effortless to scale and maintain. We’ve optimized the workflow end-to-end to ensure you have access to accurate, grounded information at every step—from document ingestion to query planning and reasoning to response generation. In fact, our benchmarks show Pinecone Assistant delivers up to 12% more accurate results than OpenAI Assistants.
Pinecone Assistant is powered by our fully managed vector database and shares the same safeguards. Your data is encrypted at rest and in transit, never used for training, and can be permanently deleted at any time.
What’s new with Pinecone Assistant:
During public preview, we introduced the Evaluation API, expanded LLM support, and metadata filters for Assistant. We've continued to develop Assistant to further improve the relevance of responses, increase customization capabilities, and expand the ways you can build with it.
Optimized interfaces to bring knowledge to chat and agentic applications
The new Chat API delivers structured, grounded responses with citations in a few simple steps. It supports both streaming and batch modes, allowing citations to be presented in real time or added to the final output. In short, you have control over how references appear. Learn more about how we support citations in our technical deep dive.
The new Context API, the context engine behind Pinecone Assistant, follows the same augmented retrieval process as the Chat API—but without the generation step—to deliver structured context (i.e., a collection of the most relevant data for the input query) as a set of expanded chunks with relevancy scores and references.
This makes it a powerful tool for agentic workflows, providing the necessary context to verify source data, prevent hallucinations, and identify the most relevant data for generating precise, reliable responses.
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
pc = Pinecone(api_key="YOUR_API_KEY")
assistant = pc.assistant.Assistant(assistant_name="example-assistant")
response = assistant.context(query="Who is the CFO of Netflix?")
for snippet in response.snippets:
print(snippet)
Learn more and see an example output as a JSON object in our documentation.
Custom instructions to fine-tune assistants for your use case
In addition to metadata filters, Assistant now supports custom instructions, allowing you to further fine-tune responses to meet your needs. Metadata filters restrict vector search by user, group, or category, while instructions let you tailor responses by providing short descriptions or directives. For example, you can set your assistant to act as a legal expert for authoritative answers or as a customer support agent for troubleshooting and user assistance.
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
from pinecone import Pinecone
pc = Pinecone(api_key=YOUR_API_KEY)
assistant = pc.assistant.update_assistant(
assistant_name="test",
instructions="Use American English for spelling and grammar."
)
Customize the instructions to reflect your assistant’s role or purpose, for example, “Use American English for spelling and grammar.”
Expanded region control and input/output formats
With some recent additions to Assistant, it’s even easier to get started. You can now create an assistant in both the EU and US regions. In addition to PDF and .txt files, Assistant now also supports JSON, .md, and .docx files as inputs, and JSON format as an output. Additional support will be added in the coming months.
# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant
import json
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message
pc = Pinecone(api_key="YOUR_API_KEY")
assistant = pc.assistant.create_assistant(
assistant_name="example-assistant",
region="eu", # Region to deploy assistant. Options: "us" (default) or "eu".
)
msg = Message(role="user", content="What is the price of a Tesla Model 3? return in the following JSON format: {'price': X})")
response = assistant.chat(messages=[msg], json_response=True)
print(json.loads(response.message.content))
Easily configure the region and output parameters for your assistant. This example uses the json_response parameter to instruct the assistant to return a JSON response.
Start building today
Pinecone Assistant is now generally available in US and EU regions for all users. [Add pricing update]. [TBD - Support for Node SDK]
Register for our Pinecone Assistant 101 on-demand webinar, learn more in our deep dive, and start building knowledgeable AI applications in minutes today.