How to create a knowledge base?
Last updated
Last updated
This guide details the steps involved in creating a knowledge base within your ZBrain account.
To begin, log into your ZBrain account.
Once you have successfully logged in, navigate to the dashboard.
Click the 'Create Knowledge Base' button to initiate the process of setting up a new knowledge base.
You will be directed to a screen where you can either upload a document from your device or import it from a data source.
This page allows users to configure the foundational elements of a new knowledge base.
After uploading or importing the data, you will be prompted to provide a name and description for the chosen file/data.
To upload additional documents to the knowledge base, click the ‘Add More’ button located below the uploaded documents. Select the documents from your device to add them.
Document summarization
Enable document summarization by toggling the dedicated switch.
Select an appropriate large language model to perform the summarization process.
This feature creates concise overviews of lengthy documents for easier comprehension.
Automated reasoning policy
Create an automated reasoning policy by activating the feature toggle.
An automated reasoning policy consists of predefined rules, conditions, and variables that guide the system's reasoning process when responding to queries.
It extracts structured data from the knowledge base, applies logical reasoning, and ensures responses are accurate and consistent.
This policy governs how the system interprets information, processes queries, and delivers answers based on established knowledge and logic.
Improve efficiency using Flow
Enable the ‘Improve Efficiency Using Flow’ option to streamline and enhance the process of transforming documents into refined knowledge bases.
This feature leverages predefined or custom flows from the Flow Library to automate data extraction and analysis, converting raw documents into structured, actionable insights.
It is essential for users seeking to create efficient knowledge bases by applying standardized data processing techniques such as text extraction, image analysis, and language model-based summarization.
By incorporating flows, you optimize the data refinement process, transforming input data into a well-organized, accessible knowledge base. This enhances the solution’s ability to handle various data formats, including text, images, and structured content, ultimately improving operational efficiency and decision-making.
When you activate the toggle to enable this feature, a button labeled ‘Add a Flow from the Flow Library’ will appear. Clicking this button will open the ‘Add a Flow’ panel.
Types of flows
There are two types of flows available:
ZBrain Flows
ZBrain Flows offer predefined automation solutions for common data processing tasks. Users can choose from the following options:
OCR (Optical Character Recognition)
Purpose: Recognizes and extracts text content from images or documents. This is particularly useful for digitizing physical documents or documents containing non-editable text (e.g., scanned PDFs).
Functionality:
Extracts text from images or scanned documents.
Enables further processing or analysis, such as searching, summarization, or automated reasoning.
Analyze each page as an image using an LLM
Purpose: Treats each document page as an image and processes both the visual and textual content for detailed analysis.
Functionality:
Converts document pages into a digital format using OCR.
Extracted text is analyzed using a Large Language Model (LLM) to:
Derive insights.
Generate summaries.
Classify content based on predefined criteria.
Extract images from the document and evaluate them using an LLM
Purpose: Designed for documents containing images that need to be analyzed.
Functionality:
Extracts images from the document.
Applies an LLM to analyze the images for:
Content recognition.
Pattern or object identification.
Useful for image-heavy documents requiring deeper content understanding.
Custom flows
The custom flows option allows users to create a flow specifically for data extraction, enabling tailored and advanced automation based on unique workflows and processing needs. Users can click on this option to add a custom flow.
Complete all the required fields and click the ‘Next’ button to proceed to the text data refinement page.
This page allows configuration of text processing parameters that determine how documents are processed and stored for optimal retrieval and knowledge base creation.
ZBrain provides two distinct chunking approaches:
Automatic: This option is recommended for users unfamiliar with the process. ZBrain will automatically set chunk and preprocessing rules based on best practices.
Custom:
This option enables experienced users to customize configurations, including defining end-of-segment characters, setting chunking rules and lengths, and applying text preprocessing rules.
These rules include replacing consecutive spaces, newlines, and tabs, and removing all URLs and email addresses.
Once you have made your changes, click the 'Confirm & Preview' button to review the results.
ZBrain allows you to select a vector store for storing and indexing your text data for efficient retrieval. Here are the available vector store options:
Pinecone: This option leverages the scalability of Pinecone, a third-party vector indexing service, directly within ZBrain.
Economical: This option utilizes ZBrain's built-in vector store with cost-effective vector engines and keyword indexes for efficient data handling.
Chroma: This option utilizes ChromaDB, a high-performance open-source vector database optimized for applications leveraging large language models. It offers robust support for embeddings, vector search, document storage, full-text search, metadata filtering, and multi-modal capabilities.
Add new connection: To use your vector store, provide the necessary API key and credentials. You can choose from Pinecone Hosted or Qdrant for vector storage. Input the connection name and enter the API key to establish the connection. To get an API key from Pinecone Hosted, follow these steps:
Open the Pinecone console and log in to your account.
Select your project from the list of projects.
Navigate to the API Keys tab.
Click ‘Create API Key.’
Enter a name for your API key.
Choose the permissions you want to assign to the API key.
Click ‘Create Key.’
Copy and securely store the generated API key, as you cannot view it again once you close the dialog. To get an API key from the Qdrant vector database, follow these steps:
Log in to the Qdrant Cloud dashboard.
Go to the cluster detail page.
Navigate to the API keys section.
Click ‘Create’ to generate a new API key.
Configure the permissions for the key if granular access control is enabled.
Click ‘OK’ and copy your API key.
Once you have the API key, enter the environment and index name.
After filling in all the required details, click ‘Add’ to complete the process.
ZBrain S3 storage: This option utilizes ZBrain's secure and scalable S3 storage for data management. It offers enhanced data management features and precise retrieval results without incurring additional token costs.
ZBrain offers various retrieval settings to define how users can search and retrieve information from a knowledge base. Here's an overview of the available settings:
Search type: You can choose between three search types:
Vector search: This method uses vector representations of text data for efficient retrieval. ZBrain utilizes an inverted index structure to map terms to relevant text chunks.
Full-text search: This method indexes all terms within your documents, allowing users to search and retrieve documents based on keywords.
Hybrid search: This option combines vector search and full-text search. ZBrain performs both searches simultaneously and then reranks the results to prioritize the most relevant documents for the user's query. To utilize hybrid search, you will need to configure a Rerank model API.
Top K: This setting determines the number of most relevant results returned for a user's search query. You can specify the desired number of results (default is 50).
Score threshold: This setting defines the minimum score a result needs to achieve to be included in the search results. You can specify a score between 0 and 1 (default is 0.2).
Choose the embedding type that best suits your use case to optimize text representation and improve performance.
It will then display a proposed document and the estimated number of chunks for your review.
Once you have confirmed your selections, click the ‘Next’ button.
On this screen, review all the details of the knowledge base you have provided earlier. If everything appears accurate, click the ‘Manage Knowledge Base’ button to complete the creation process.
Your newly created knowledge base is now accessible for use within your ZBrain solutions. You can create additional knowledge bases by clicking on the ‘Add’ button or delete existing ones using the ‘Delete’ button.