How to create a knowledge base?
This guide details the steps involved in creating a knowledge base within your ZBrain account.
Getting started
To begin, log into your ZBrain account.
Once you have successfully logged in, navigate to the dashboard.
Click the 'Create Knowledge Base' button to initiate the process of setting up a new knowledge base.

Uploading document
You will be directed to a screen where you can either upload a document from your device or import it from a data source.

Data Source Configuration
This page allows users to configure the foundational elements of a new knowledge base.
After uploading or importing the data, you will be prompted to provide a name and description for the chosen file/data.
To upload additional documents to the knowledge base, click the ‘Add More’ button located below the uploaded documents. Select the documents from your device to add them.
Additional features
Document summarization
Enable document summarization by toggling the dedicated switch.
Select an appropriate large language model to perform the summarization process.
This feature creates concise overviews of lengthy documents for easier comprehension.
Automated reasoning policy
Create an automated reasoning policy by activating the feature toggle.
An automated reasoning policy consists of predefined rules, conditions, and variables that guide the system's reasoning process when responding to queries.
It extracts structured data from the knowledge base, applies logical reasoning, and ensures responses are accurate and consistent.
This policy governs how the system interprets information, processes queries, and delivers answers based on established knowledge and logic.

Improve efficiency using Flow
Enable the ‘Improve Efficiency Using Flow’ option to streamline and enhance the process of transforming documents into refined knowledge bases.
This feature leverages predefined or custom flows from the Flow Library to automate data extraction and analysis, converting raw documents into structured, actionable insights.
It is essential for users seeking to create efficient knowledge bases by applying standardized data processing techniques such as text extraction, image analysis, and language model-based summarization.
By incorporating flows, you optimize the data refinement process, transforming input data into a well-organized, accessible knowledge base. This enhances the solution’s ability to handle various data formats, including text, images, and structured content, ultimately improving operational efficiency and decision-making.
When you activate the toggle to enable this feature, a button labeled ‘Add a Flow from the Flow Library’ will appear. Clicking this button will open the ‘Add a Flow’ panel.

Types of flows
There are two types of flows available:
ZBrain Flows
ZBrain Flows offer predefined automation solutions for common data processing tasks. Users can choose from the following options:
OCR (Optical Character Recognition)
Purpose: Recognizes and extracts text content from images or documents. This is particularly useful for digitizing physical documents or documents containing non-editable text (e.g., scanned PDFs).
Functionality:
Extracts text from images or scanned documents.
Enables further processing or analysis, such as searching, summarization, or automated reasoning.
Analyze each page as an image using an LLM
Purpose: Treats each document page as an image and processes both the visual and textual content for detailed analysis.
Functionality:
Converts document pages into a digital format using OCR.
Extracted text is analyzed using a Large Language Model (LLM) to:
Derive insights.
Generate summaries.
Classify content based on predefined criteria.
Extract images from the document and evaluate them using an LLM
Purpose: Designed for documents containing images that need to be analyzed.
Functionality:
Extracts images from the document.
Applies an LLM to analyze the images for:
Content recognition.
Pattern or object identification.
Useful for image-heavy documents requiring deeper content understanding.

Custom flows
The custom flows option allows users to create a flow specifically for data extraction, enabling tailored and advanced automation based on unique workflows and processing needs. Users can click on this option to add a custom flow.

Complete all the required fields and click the ‘Next’ button to proceed to the text data refinement page.
Text data refinement
This page allows configuration of text processing parameters that determine how documents are processed and stored for optimal retrieval and knowledge base creation.
RAG definition
In this step, configure the data retrieval model that determines how your knowledge base will store, index, and retrieve content to generate accurate, context-aware responses to user queries.
ZBrain supports two RAG storage models:
Vector store – default, chunk-and-embedding index for semantic similarity search
Knowledge graph – entity–relationship graph stored in ZBrain’s graph database

Select the one that best matches your data and query needs.
Vector store
Splits each document into chunks, converts the chunks into high-dimensional embeddings, and saves them in a vector database. At query time, ZBrain performs a semantic-similarity search and supplies the matched chunks to the LLM.
Unstructured text
Rapid prototyping
Relationship reasoning is not critical
Knowledge graph
Extracts entities and relationships from every chunk, stores them as nodes and edges, and also embeds the chunk text so you can fall back on vector similarity. The query engine can traverse the graph, run vector search, or perform both operations.
Information with critical inter-entity relationships, such as product-component hierarchies, chronological timelines, or complex organizational structures.
Option 1: Vector store selection (Default option)
If you have selected the vector store option in your RAG definition, you will be able to choose from four available vector store types listed below:
Pinecone: This option leverages the scalability of Pinecone, a third-party vector indexing service, directly within ZBrain.
Economical: This option utilizes ZBrain's built-in vector store with cost-effective vector engines and keyword indexes for efficient data handling.
Chroma: This option utilizes ChromaDB, a high-performance open-source vector database optimized for applications leveraging large language models. It offers robust support for embeddings, vector search, document storage, full-text search, metadata filtering, and multi-modal capabilities.

Add new connection: To use your vector store, provide the necessary API key and credentials. You can choose from Pinecone Hosted or Qdrant for vector storage. Input the connection name and enter the API key to establish the connection. To get an API key from Pinecone Hosted, follow these steps:
Open the Pinecone console and log in to your account.
Select your project from the list of projects.
Navigate to the API Keys tab.
Click ‘Create API Key.’
Enter a name for your API key.
Choose the permissions you want to assign to the API key.
Click ‘Create Key.’
Copy and securely store the generated API key, as you cannot view it again once you close the dialog. To get an API key from the Qdrant vector database, follow these steps:
Log in to the Qdrant Cloud dashboard.
Go to the cluster detail page.
Navigate to the API keys section.
Click ‘Create’ to generate a new API key.
Configure the permissions for the key if granular access control is enabled.
Click ‘OK’ and copy your API key.
Once you have the API key, enter the environment and index name.
After filling in all the required details, click ‘Add’ to complete the process.

Option 2: Knowledge graph selection
Depending on your requirements, ZBrain lets you create a knowledge graph (KG) as an alternative to a traditional vector store. Below is the available graph store option for the knowledge graph:
Economical: This option utilizes ZBrain's built-in vector store with cost-effective vector engines and keyword indexes for efficient data handling.
File store selection
ZBrain S3 storage: This option utilizes ZBrain's secure and scalable S3 storage for data management. It offers enhanced data management features and precise retrieval results without incurring additional token costs.
Chunk settings
ZBrain provides two distinct chunking approaches for both options:
Automatic: This option is recommended for users unfamiliar with the process. ZBrain will automatically set chunk and preprocessing rules based on best practices.

Custom:
This option enables experienced users to customize configurations, including defining end-of-segment characters, setting chunking rules and lengths, and applying text preprocessing rules.
These rules include replacing consecutive spaces, newlines, and tabs, and removing all URLs and email addresses.
Once you have made your changes, click the 'Confirm & Preview' button to review the results.

Retrieval settings
ZBrain offers various retrieval settings to define how users can search and retrieve information from a knowledge base. Here's an overview of the available settings:
For vector store selection
Search type: You can choose between three search types:
Vector search: This method uses vector representations of text data for efficient retrieval. ZBrain utilizes an inverted index structure to map terms to relevant text chunks.
Full-text search: This method indexes all terms within your documents, allowing users to search and retrieve documents based on keywords.
Hybrid search: This option combines vector search and full-text search. ZBrain performs both searches simultaneously and then reranks the results to prioritize the most relevant documents for the user's query. To utilize hybrid search, you will need to configure a Rerank model API.
Top K: This setting determines the number of most relevant results returned for a user's search query. You can specify the desired number of results (default is 50).
Score threshold: This setting defines the minimum score a result needs to achieve to be included in the search results. You can specify a score between 0.01 and 1 (the default is 0.2).

For knowledge graph selection
Search type: You can choose between five search types:

Naive Mode: Falls back to basic vector similarity on text chunks (no KG traversal).
Best suited for: Quick POCs; content without rich relationships.
Local Mode: This search looks up context-dependent facts about a single entity using low-level keywords.
Best suited for: Ideal for Q&A about a particular policy, product feature, or isolated technical detail.
Global Mode: Emphasizes relationship-based knowledge, traversing edges to reveal broader connections between concepts.
Best suited for: For holistic questions that require networked insights, e.g., “How do X, Y, and Z relate?”
Hybrid Mode: Combines both local and global retrieval, then merges the results.
Best suited for: Complex business questions that need both entity facts and contextual relationships.
Mix Mode: Executes both vector (semantic) and graph retrieval in parallel, drawing from unstructured and structured data, including time metadata.
Best suited for: Multi-layered queries that span different data types or dimensions, such as timelines, comparisons, or multifaceted evaluations.
Top K: This setting determines the number of most relevant results returned for a user's search query. You can specify the desired number of results (default is 50).
Score threshold: This setting defines the minimum score a result needs to achieve to be included in the search results. You can specify a score between 0.01 and 1 (default is 0.2).
Embedding model
Choose the embedding type that best suits your use case to optimize text representation and improve performance.
Upon selecting a vector store in the RAG definition, the following embedding models are available for use:

The following embedding models are available when knowledge graph is chosen in the RAG definition:

Knowledge Graph LLM ( for knowledge graph selection)
Choose the LLM that will perform reasoning over the knowledge graph (default:
gpt-4o
). The chosen model powers query rewriting, path finding and answer synthesis.

It will then display the proposed document and the estimated number of chunks for your review.
Once you have confirmed your selections, click the ‘Next’ button.

Execute and finish
On this screen, review all the details of the knowledge base you have provided earlier. If everything appears accurate, click the ‘Manage Knowledge Base’ button to complete the creation process.

Your newly created knowledge base is now accessible for use within your ZBrain solutions. You can create additional knowledge bases by clicking on the ‘Add’ button or delete existing ones using the ‘Delete’ button.

Last updated