Skip to content

IRISDocumentStore

IRISDocumentStore

IRISDocumentStore(*, connection_string: Secret | None = None, username: Secret | None = None, password: Secret | None = None, table_name: str = 'HaystackDocuments', embedding_dim: int = 384, bm25_k1: float = 1.5, bm25_b: float = 0.75, recreate_table: bool = False)

A DocumentStore backed by InterSystems IRIS_.

Uses IRIS native VECTOR(DOUBLE, N) column for embedding storage and VECTOR_COSINE function for semantic similarity search.

.. _InterSystems IRIS: https://www.intersystems.com/products/intersystems-iris/

Credentials

Pass the connection string and password as Haystack :class:~haystack.utils.Secret objects. The recommended approach is to export environment variables and use :meth:~haystack.utils.Secret.from_env_var:

.. code-block:: bash

export IRIS_CONNECTION_STRING="localhost:1972/USER"
export IRIS_USERNAME="_system"
export IRIS_PASSWORD="SYS"

.. code-block:: python

from intersystems_iris_haystack.document_stores import IRISDocumentStore
store = IRISDocumentStore()

Retrievers

Use the companion retrievers for embedding-based and keyword-based search:

.. code-block:: python

from intersystems_iris_haystack.components.retrievers import (
    IRISEmbeddingRetriever,
    IRISBm25Retriever,
)

Table schema (created automatically if it doesn't exist)

.. code-block:: sql

CREATE TABLE SQLUser.<table_name> (
    id        VARCHAR(128)  NOT NULL PRIMARY KEY,
    content   LONGVARCHAR,
    meta      LONGVARCHAR,   -- JSON with sort_keys=True
    score     DOUBLE,
    embedding VECTOR(DOUBLE, <embedding_dim>)
)

Parameters:

Name Type Description Default
connection_string Secret | None

IRIS DB-API connection string in the format host:port/namespace. Resolved from the IRIS_CONNECTION_STRING environment variable by default.

None
username Secret | None

IRIS username. Resolved from IRIS_USERNAME by default.

None
password Secret | None

IRIS password. Resolved from IRIS_PASSWORD by default.

None
table_name str

Name of the SQL table (without schema). The SQLUser schema is prepended automatically.

'HaystackDocuments'
embedding_dim int

Number of dimensions of the embedding vectors. Must match the embedding model used at indexing time.

384
bm25_k1 float

BM25 term-frequency saturation parameter (typical: 1.2-2.0).

1.5
bm25_b float

BM25 length-normalization parameter (0.0-1.0).

0.75
recreate_table bool

Drop and re-create the table on initialization. Use with caution — all data will be lost.

False

Raises:

Type Description
ConnectionError

If unable to connect to IRIS after :data:_MAX_RETRIES attempts.

Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def __init__(
    self,
    *,
    connection_string: Secret | None = None,
    username: Secret | None = None,
    password: Secret | None = None,
    table_name: str = "HaystackDocuments",
    embedding_dim: int = 384,
    bm25_k1: float = 1.5,
    bm25_b: float = 0.75,
    recreate_table: bool = False,
) -> None:
    self.connection_string = connection_string or Secret.from_env_var("IRIS_CONNECTION_STRING")
    self.username = username or Secret.from_env_var("IRIS_USERNAME")
    self.password = password or Secret.from_env_var("IRIS_PASSWORD")
    self.table_name = table_name
    self.embedding_dim = embedding_dim
    self.bm25_k1 = bm25_k1
    self.bm25_b = bm25_b
    self.recreate_table = recreate_table

    self._conn = None
    self._bm25 = _BM25Index(k1=bm25_k1, b=bm25_b)

    self._connect_with_retry()
    if recreate_table:
        self._drop_table()
    self._create_table_if_not_exists()

count_documents

count_documents() -> int

Return the number of documents in the store.

Returns:

Type Description
int

Total document count.

Example

store.count_documents() 5

Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def count_documents(self) -> int:
    """
    Return the number of documents in the store.

    Returns
    -------
    int
        Total document count.

    Example
    -------
    >>> store.count_documents()
    5
    """
    cur = self._cursor()
    try:
        cur.execute(f"SELECT COUNT(*) FROM SQLUser.{self.table_name}")  # noqa: S608
        row = cur.fetchone()
        return int(row[0]) if row else 0
    finally:
        cur.close()

filter_documents

filter_documents(filters: dict[str, Any] | None = None) -> list[Document]

Return documents matching the provided filters.

Filtering is performed in-memory after loading all documents from IRIS. This ensures compatibility with IRIS Community Edition (which lacks JSON_VALUE as a native SQL function) and supports all Python types and operators defined by the Haystack protocol.

Parameters:

Name Type Description Default
filters dict[str, Any] | None

Filter in legacy format {"field": value} or in the official Haystack format {"operator": ..., "conditions": ...}. None returns all documents.

None

Returns:

Type Description
list[Document]

Documents satisfying the filter.

Examples:

No filter — all documents::

store.filter_documents()

Legacy filter::

store.filter_documents({"category": "db"})

Integer filter::

store.filter_documents({"year": 2024})

Official Haystack filter with AND and >=::

store.filter_documents({
    "operator": "AND",
    "conditions": [
        {"field": "meta.category", "operator": "==", "value": "db"},
        {"field": "meta.year",     "operator": ">=", "value": 2023},
    ],
})
Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def filter_documents(
    self,
    filters: dict[str, Any] | None = None,
) -> list[Document]:
    """
    Return documents matching the provided filters.

    Filtering is performed **in-memory** after loading all documents
    from IRIS.  This ensures compatibility with IRIS Community Edition
    (which lacks ``JSON_VALUE`` as a native SQL function) and supports
    all Python types and operators defined by the Haystack protocol.

    Parameters
    ----------
    filters:
        Filter in legacy format ``{"field": value}`` or in the
        official Haystack format ``{"operator": ..., "conditions": ...}``.
        ``None`` returns all documents.

    Returns
    -------
    list[Document]
        Documents satisfying the filter.

    Examples
    --------
    No filter — all documents::

        store.filter_documents()

    Legacy filter::

        store.filter_documents({"category": "db"})

    Integer filter::

        store.filter_documents({"year": 2024})

    Official Haystack filter with ``AND`` and ``>=``::

        store.filter_documents({
            "operator": "AND",
            "conditions": [
                {"field": "meta.category", "operator": "==", "value": "db"},
                {"field": "meta.year",     "operator": ">=", "value": 2023},
            ],
        })
    """
    cur = self._cursor()
    try:
        cur.execute(
            f"SELECT id, content, meta, score "  # noqa: S608
            f"FROM SQLUser.{self.table_name}"
        )
        rows = cur.fetchall()
    finally:
        cur.close()

    docs = [self._row_to_document(row) for row in rows]
    if not filters:
        return docs
    # return [d for d in docs if _apply_filter(d.meta, filters)]
    return [d for d in docs if document_matches_filter(filters, d)]

write_documents

write_documents(documents: list[Document], policy: DuplicatePolicy = DuplicatePolicy.NONE) -> int

Persist documents to IRIS.

Documents with embeddings are stored using TO_VECTOR(?, DOUBLE) for native IRIS vector type conversion. The meta field is serialized with json.dumps(sort_keys=True) to guarantee deterministic key ordering.

Parameters:

Name Type Description Default
documents list[Document]

List of :class:~haystack.Document objects to persist.

required
policy DuplicatePolicy

Duplicate handling policy:

  • FAIL (default): raise :exc:DuplicateDocumentError.
  • SKIP: silently ignore duplicate documents.
  • OVERWRITE: replace the existing document.
NONE

Returns:

Type Description
int

Number of documents written.

Raises:

Type Description
DuplicateDocumentError

If policy=FAIL and a duplicate document ID is found.

ValueError

If documents contains non-:class:~haystack.Document objects.

Example

from haystack import Document from haystack.document_stores.types import DuplicatePolicy store.write_documents( ... [Document(content="Hello IRIS!", meta={"lang": "en"})], ... policy=DuplicatePolicy.OVERWRITE, ... ) 1

Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def write_documents(
    self,
    documents: list[Document],
    policy: DuplicatePolicy = DuplicatePolicy.NONE,
) -> int:
    """
    Persist documents to IRIS.

    Documents with embeddings are stored using ``TO_VECTOR(?, DOUBLE)``
    for native IRIS vector type conversion.  The ``meta`` field is
    serialized with ``json.dumps(sort_keys=True)`` to guarantee
    deterministic key ordering.

    Parameters
    ----------
    documents:
        List of :class:`~haystack.Document` objects to persist.
    policy:
        Duplicate handling policy:

        - ``FAIL`` *(default)*: raise :exc:`DuplicateDocumentError`.
        - ``SKIP``: silently ignore duplicate documents.
        - ``OVERWRITE``: replace the existing document.

    Returns
    -------
    int
        Number of documents written.

    Raises
    ------
    DuplicateDocumentError
        If ``policy=FAIL`` and a duplicate document ID is found.
    ValueError
        If ``documents`` contains non-:class:`~haystack.Document` objects.

    Example
    -------
    >>> from haystack import Document
    >>> from haystack.document_stores.types import DuplicatePolicy
    >>> store.write_documents(
    ...     [Document(content="Hello IRIS!", meta={"lang": "en"})],
    ...     policy=DuplicatePolicy.OVERWRITE,
    ... )
    1
    """
    if policy == DuplicatePolicy.NONE:
        policy = DuplicatePolicy.FAIL

    written = 0
    cur = self._cursor()
    try:
        for doc in documents:
            if not isinstance(doc, Document):
                msg = f"Expected a Document object, got {type(doc).__name__!r}."
                raise ValueError(msg)
            existing = self._get_by_id(doc.id, cur)
            if existing:
                if policy == DuplicatePolicy.FAIL:
                    msg = f"Document with id '{doc.id}' already exists. Use DuplicatePolicy.SKIP or OVERWRITE."
                    raise DuplicateDocumentError(msg)
                if policy == DuplicatePolicy.SKIP:
                    logger.debug("Skipping duplicate document: %s", doc.id)
                    continue
                cur.execute(
                    f"DELETE FROM SQLUser.{self.table_name} WHERE id = ?",  # noqa: S608
                    [doc.id],
                )

            meta_str = json.dumps(doc.meta or {}, ensure_ascii=False, sort_keys=True)
            emb_str = self._embedding_to_str(doc.embedding)

            if emb_str:
                cur.execute(
                    f"INSERT INTO SQLUser.{self.table_name} "  # noqa: S608
                    f"(id, content, meta, score, embedding) "
                    f"VALUES (?, ?, ?, ?, TO_VECTOR(?, DOUBLE))",
                    [doc.id, doc.content or "", meta_str, doc.score, emb_str],
                )
            else:
                cur.execute(
                    f"INSERT INTO SQLUser.{self.table_name} "  # noqa: S608
                    f"(id, content, meta, score) VALUES (?, ?, ?, ?)",
                    [doc.id, doc.content or "", meta_str, doc.score],
                )
            written += 1

        self._conn.commit()
        logger.debug("Wrote %d document(s) to IRIS.", written)
        return written
    except Exception:
        self._conn.rollback()
        raise
    finally:
        cur.close()

delete_documents

delete_documents(document_ids: list[str]) -> None

Delete documents by ID.

Accepts an empty list without error (idempotent). Non-existent IDs are silently ignored by IRIS.

Parameters:

Name Type Description Default
document_ids list[str]

List of document IDs to remove.

required
Example

store.delete_documents(["id-1", "id-2"])

Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def delete_documents(self, document_ids: list[str]) -> None:
    """
    Delete documents by ID.

    Accepts an empty list without error (idempotent). Non-existent
    IDs are silently ignored by IRIS.

    Parameters
    ----------
    document_ids:
        List of document IDs to remove.

    Example
    -------
    >>> store.delete_documents(["id-1", "id-2"])
    """
    if not document_ids:
        return
    placeholders = ",".join(["?"] * len(document_ids))
    cur = self._cursor()
    try:
        cur.execute(
            f"DELETE FROM SQLUser.{self.table_name} "  # noqa: S608
            f"WHERE id IN ({placeholders})",
            document_ids,
        )
        self._conn.commit()
    finally:
        cur.close()

to_dict

to_dict() -> dict[str, Any]

Serialize the store for use in Haystack YAML/JSON pipelines.

The password Secret is serialized according to the Haystack secret protocol — the resolved value is never included.

Returns:

Type Description
dict

Serializable dictionary.

Example

d = store.to_dict() d["type"] 'intersystems_iris_haystack.document_stores.document_store.IRISDocumentStore'

Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def to_dict(self) -> dict[str, Any]:
    """
    Serialize the store for use in Haystack YAML/JSON pipelines.

    The password ``Secret`` is serialized according to the Haystack
    secret protocol — the resolved value is **never** included.

    Returns
    -------
    dict
        Serializable dictionary.

    Example
    -------
    >>> d = store.to_dict()
    >>> d["type"]
    'intersystems_iris_haystack.document_stores.document_store.IRISDocumentStore'
    """
    return default_to_dict(
        self,
        connection_string=self.connection_string.to_dict(),
        username=self.username.to_dict(),
        password=self.password.to_dict(),
        table_name=self.table_name,
        embedding_dim=self.embedding_dim,
        bm25_k1=self.bm25_k1,
        bm25_b=self.bm25_b,
        recreate_table=False,  # never re-drop on restore
    )

from_dict classmethod

from_dict(data: dict[str, Any]) -> IRISDocumentStore

Deserialize the store from a dictionary.

Called automatically by Haystack when loading a pipeline from a YAML file.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary in the format produced by :meth:to_dict.

required

Returns:

Type Description
IRISDocumentStore
Source code in src/intersystems_iris_haystack/document_stores/document_store.py
@classmethod
def from_dict(cls, data: dict[str, Any]) -> IRISDocumentStore:
    """
    Deserialize the store from a dictionary.

    Called automatically by Haystack when loading a pipeline from a
    YAML file.

    Parameters
    ----------
    data:
        Dictionary in the format produced by :meth:`to_dict`.

    Returns
    -------
    IRISDocumentStore
    """
    deserialize_secrets_inplace(
        data["init_parameters"],
        keys=["connection_string", "username", "password"],
    )
    return default_from_dict(cls, data)

close

close() -> None

Close the IRIS connection (idempotent).

Source code in src/intersystems_iris_haystack/document_stores/document_store.py
def close(self) -> None:
    """Close the IRIS connection (idempotent)."""
    if self._conn:
        try:
            self._conn.close()
        except Exception:  # noqa: S110
            pass
        logger.debug("IRIS connection closed.")