RAG Meetup at Pinecone HQEvaluating RAG Applications Workshop with Weights and BiasesRegister
Preview Mode ()

With Pinecone serverless, you only pay for what you use. The separation of read and write paths inside the DB enables this dynamic cost structure. This article illustrates how Read Units (RUs) work, so you can better understand billing and monitor your usage.

In the future, we will take a deep dive into understanding RUs’ counterparts: Write Units and Storage.

Jump to the code to explore the information in this article live.

What are Read Units (RUs)?

RUs measure the resources consumed by read operations such as query, fetch, and list.

An example of a query request would be sending a question to the DB (e.g., “What is a dog?”) and getting vectors (and, optionally, metadata) returned. An example of a fetch request would be sending a string of vector IDs to the DB to get their associated vectors returned; and an example of a list request would be requesting `n` number of vector IDs be returned.

Usage notes per endpoint:

How to inspect RUs

Every read operation will have its associated RUs returned in its response. An example response would look like this:

{'matches': [{'id': 'c1eb2875-bd4e-449d-a059-232edb62c62a',
              'metadata': None,
              'score': 0.0,
              'sparse_values': {'indices': [], 'values': []},
              'values': []}],
'namespace': '50k',
 'usage': {'read_units': 5}}  # >>> Here are your RUs!

You can see above that whatever read operation produced the above result, it consumed 5 RUs (“usage”: {“read_units”: 5}}).

See our example notebook to gain a deeper understanding of how different query configurations lead to different RU spend.

To project future overall costs, use our cost calculator.

Sublinear cost growth

While RUs are a function of the size of your namespace (the sheer number of vectors, the dimensionality of those vectors, and the presence of any metadata), they grow sublinearly with your namespace.

For example, if it costs 5 RUs to query a namespace with 50k, 1536-dimensional vectors, the cost of querying an index 4x that size is not 20 RUs. Instead, it will be approximately 8 RUs. This sublinear growth keeps costs low while allowing you to scale quickly.

Pinecone’s specialized indexing and retrieval algorithms enable this sublinear growth by clustering similar vectors together. At query time, only a subset of the total namespace is searched. The result is sublinear growth of RU cost as the total number of vectors in the namespace increases.


Let us know what you think at community.pinecone.io.

Share: