Pipeline Serialization¶
Haystack pipelines can be saved to and loaded from YAML files. iris-haystack fully supports this — both IRISDocumentStore and the companion retrievers implement to_dict() and from_dict().
How it works¶
Haystack serializes a pipeline by calling to_dict() on each component. The resulting dictionary contains the class path and initialization parameters. On deserialization, from_dict() reconstructs each component.
For IRISDocumentStore, credentials are handled specially:
- The
connection_string,username, andpasswordSecretobjects are serialized as their env var names (e.g.,"IRIS_PASSWORD"), never as resolved values. - On
from_dict(),deserialize_secrets_inplacerebuilds theSecretobjects, which then resolve the values from the environment at runtime.
Saving a pipeline¶
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from intersystems_iris_haystack.document_stores import IRISDocumentStore
from intersystems_iris_haystack.components.retrievers import IRISEmbeddingRetriever
store = IRISDocumentStore(embedding_dim=384)
pipeline = Pipeline()
pipeline.add_component(
"embedder",
SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"
),
)
pipeline.add_component(
"retriever",
IRISEmbeddingRetriever(document_store=store, top_k=5),
)
pipeline.connect("embedder.embedding", "retriever.query_embedding")
# Save to YAML
with open("rag_pipeline.yaml", "w") as f:
pipeline.dump(f)
The generated rag_pipeline.yaml will look like:
components:
embedder:
init_parameters:
model: sentence-transformers/all-MiniLM-L6-v2
type: haystack.components.embedders.sentence_transformers_text_embedder.SentenceTransformersTextEmbedder
retriever:
init_parameters:
document_store:
init_parameters:
connection_string:
env_vars:
- IRIS_CONNECTION_STRING # env var name, not the value
strict: true
type: env_var
embedding_dim: 384
password:
env_vars:
- IRIS_PASSWORD
strict: true
type: env_var
table_name: HaystackDocuments
username:
env_vars:
- IRIS_USERNAME
strict: true
type: env_var
type: intersystems_iris_haystack.document_stores.document_store.IRISDocumentStore
filter_policy: replace
top_k: 5
type: intersystems_iris_haystack.components.retrievers.embedding_retriever.IRISEmbeddingRetriever
connections:
- receiver: retriever.query_embedding
sender: embedder.embedding
max_runs_per_component: 100
metadata: {}
Loading a pipeline¶
# Make sure credentials are set in the environment
import os
os.environ["IRIS_CONNECTION_STRING"] = "localhost:1972/USER"
os.environ["IRIS_USERNAME"] = "_system"
os.environ["IRIS_PASSWORD"] = "SYS"
from haystack import Pipeline
with open("rag_pipeline.yaml") as f:
pipeline = Pipeline.load(f)
# Use it immediately
result = pipeline.run({"embedder": {"text": "what is vector search?"}})
BM25 pipeline example¶
from intersystems_iris_haystack.components.retrievers import IRISBm25Retriever
bm25_pipeline = Pipeline()
bm25_pipeline.add_component(
"retriever",
IRISBm25Retriever(document_store=store, top_k=10),
)
with open("bm25_pipeline.yaml", "w") as f:
bm25_pipeline.dump(f)
with open("bm25_pipeline.yaml") as f:
restored = Pipeline.load(f)
result = restored.run({"retriever": {"query": "database SQL"}})
Security considerations¶
Never commit rag_pipeline.yaml with hardcoded credentials
The YAML file is safe to commit as long as you use Secret.from_env_var (the default). The YAML will contain the env var name, not the resolved password.
If you initialize the store with Secret.from_token("my-password") instead, the token value will appear in the YAML. Never do this in production.