[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[elpa] externals/elisa fbfe3b4ae1 86/98: Merge pull request #12 from s-k
From: |
ELPA Syncer |
Subject: |
[elpa] externals/elisa fbfe3b4ae1 86/98: Merge pull request #12 from s-kostyaev/semantic-split |
Date: |
Wed, 17 Jul 2024 18:58:07 -0400 (EDT) |
branch: externals/elisa
commit fbfe3b4ae18210d890bf07ee01a2fa4adaf91404
Merge: c03baded1e 7460059992
Author: Sergey Kostyaev <s-kostyaev@users.noreply.github.com>
Commit: GitHub <noreply@github.com>
Merge pull request #12 from s-kostyaev/semantic-split
Add semantic splitting, hybrid search reranker, web search and files support
---
.github/workflows/melpa.yml | 4 +-
README.org | 315 +++++++++--
elisa.el | 1276 +++++++++++++++++++++++++++++++++++++++----
3 files changed, 1432 insertions(+), 163 deletions(-)
diff --git a/.github/workflows/melpa.yml b/.github/workflows/melpa.yml
index bc265616ee..3cdd1bd812 100644
--- a/.github/workflows/melpa.yml
+++ b/.github/workflows/melpa.yml
@@ -22,10 +22,12 @@ jobs:
matrix:
emacs_version:
- 29.2
+ - 29.3
+ - 29.4
ignore_warnings:
- false
warnings_as_errors:
- - false
+ - true
check:
- melpa
file:
diff --git a/README.org b/README.org
index 0491105d2b..97b795d292 100644
--- a/README.org
+++ b/README.org
@@ -3,22 +3,37 @@
[[http://www.gnu.org/licenses/gpl-3.0.txt][file:https://img.shields.io/badge/license-GPL_3-green.svg]]
[[https://melpa.org/#/elisa][file:https://melpa.org/packages/elisa-badge.svg]]
-ELISA (Emacs Lisp Information System Assistant) is a project
-designed to help Emacs users quickly find answers to their
-questions related to Emacs and Emacs Lisp. Utilizing the powerful
-Ellama package, ELISA provides accurate and relevant responses to
-user queries, enhancing productivity and efficiency in the Emacs
-environment. By integrating links to the Emacs info manual after
-answering a question, ELISA ensures that users have easy access to
-additional information on the topic, making it an essential tool
-for both beginners and advanced Emacs users.
-
-ELISA creates index from info manuals. When you send message to
-~elisa-chat~ it search to semantically similar info nodes in index,
-get first ~elisa-limit~ nodes, add it to context and send your message
-to llm. LLM generates answer to your message based on provided
-context. You can read not only answer generated by llm, but also info
-manuals by provided links.
+ELISA (Emacs Lisp Information System Assistant) is a system designed
+to provide informative answers to user queries by leveraging a
+Retrieval Augmented Generation (RAG) approach.
+
+*** Data Sources and Processing
+
+ELISA can access and process information from multiple sources,
+including:
+
++ *Local Files:* ELISA can analyze text content within local files,
+ enabling it to retrieve information specific to a user's projects or
+ documents.
++ *Info manuals:* ELISA has access to the comprehensive Emacs info
+ manuals covering Emacs itself, Emacs Lisp, and various Emacs Lisp
+ packages.
++ *Web Search:* ELISA integrates web search capabilities to provide
+ access to a vast pool of publicly available information.
+
+*** RAG Methodology
+
+ELISA implements a RAG framework to process and respond to queries. This
+involves:
+
+1. *Data Parsing:* Input data is parsed and organized into structured
+ collections for efficient retrieval.
+2. *Contextual Analysis:* When a query is received, ELISA analyzes the
+ context within relevant data collections to identify passages
+ containing potentially useful information.
+3. *Response Generation:* ELISA synthesizes a response based on the
+ identified contextual quotes, aiming to provide a comprehensive and
+ accurate answer to the user's question.
** Installation
@@ -27,20 +42,16 @@ You need emacs 29.2 or newer to use this package.
This package now on [[https://melpa.org/#/getting-started][MELPA]] and you can
just ~M-x~ ~package-install~
~elisa~.
-*** Alternative method
-
-You can use ~package-vc~ to install ELISA:
-
-#+begin_src emacs-lisp
- (package-vc-install "https://github.com/s-kostyaev/elisa")
-#+end_src
-
*** System dependencies
+**** Sqlite extensions
+
Then you need to download ~sqlite-vss~. You can do it manually from
https://github.com/asg017/sqlite-vss/releases or by calling ~M-x~
~elisa-download-sqlite-vss~.
+**** Large language models
+
You can use this package with different llm providers. By default it
uses [[https://github.com/jmorganca/ollama][ollama]] provider both for
embeddings and chat. If you ok with it,
you need to install [[https://github.com/jmorganca/ollama][ollama]] and pull
used models:
@@ -59,6 +70,28 @@ example:
- other [[https://ollama.com/library][models]] or
[[https://github.com/ahyatt/llm?tab=readme-ov-file#setting-up-providers][providers]]
- [[https://github.com/ollama/ollama?tab=readme-ov-file#create-a-model][create
your own model]]
+I prefer this models:
+
+#+begin_src shell
+ ollama pull gemma2:9b-instruct-q6_K
+ ollama pull chatfire/bge-m3:q8_0
+#+end_src
+
+**** Reranker
+
+Reranker disabled by default to decrease number of system
+dependencies, but it improves quality of retrieving and answers
+significantly. You can find installation instructions
[[https://github.com/s-kostyaev/reranker][here]].
+Recommended.
+
+**** Web search provider
+
+By defauld [[https://duckduckgo.com][duckduckgo]] used for web search. But I
prefer [[https://github.com/searxng/searxng][searxng]]. The
+simplest way to use searxng is
[[https://github.com/searxng/searxng-docker][docker]]. You need to enable json
format
+in
[[https://docs.searxng.org/admin/settings/settings_search.html#settings-search][settings]].
+
+**** Parse info manuals
+
Create index for builtin, external or all info manuals by one of this
commands:
- ~elisa-async-parse-builtin-manuals~
@@ -71,10 +104,9 @@ This can take some time.
*** elisa-chat
-Entrypoint. Makes similarity search in index, add semantically similar
-info nodes into context and query llm for prompt. Uses ~ellama~ under
-the hood. Call one of parse manuals functions to create index before
-use it.
+Entrypoint. Makes hybrid search in enabled collections, add founded
+quotes into context and query llm for prompt. Uses ~ellama~ under the
+hood.
*** elisa-download-sqlite-vss
@@ -86,7 +118,7 @@ Parse builtin emacs info manuals asyncronously. Can take
long time.
*** elisa-async-parse-external-manuals
-Parse external emacs info manuals asyncronously.
+Parse external emacs info manuals asyncronously. Can take long time.
*** elisa-async-parse-all-manuals
@@ -95,30 +127,227 @@ Parse all emacs info manuals asyncronously.
One of parse functions should be called before ~elisa-chat~ to create
index.
+*** elisa-web-search
+
+Search the web and answer to user query based on found information.
+
+**** How it works
+
+Search the web for user query. Create new collection with user query
+as name. Parse web pages to this new collection. Search in this
+collection. Add related information to context. Ask llm to answer user
+query based on provided context.
+
+*** elisa-async-parse-directory
+
+Parse directory as new collection. Can take long time. Works
+asyncronously and incrementally.
+
+*** elisa-reparse-current-collection
+
+Incrementally reparse current directory collection.
+It does nothing if buffer file not inside one of existing collections.
+Works asyncronously.
+
+*** elisa-create-empty-collection
+
+Create new empty collection.
+
+*** elisa-add-file-to-collection
+
+Add file to collection.
+
+*** elisa-add-webpage-to-collection
+
+Add webpage to collection.
+
+*** elisa-enable-collection
+
+Enable collection for ~elisa-chat~.
+
+*** elisa-disable-collection
+
+Disable collection.
+
+*** elisa-disble-all-collections
+
+Disable all collections.
+
+*** elisa-remove-collection
+
+Removes collection and all its data from index.
+
** Configuration
-Example configuration. With default installation you don't need it.
+Example configuration.
#+begin_src emacs-lisp
(use-package elisa
:init
(setopt elisa-limit 5)
+ ;; reranker increases answer quality significantly
+ (setopt elisa-reranker-enabled t)
(require 'llm-ollama)
- (setopt elisa-embeddings-provider (make-llm-ollama :embedding-model
"nomic-embed-text"))
- (setopt elisa-chat-provider (make-llm-ollama
- :chat-model "sskostyaev/openchat:8k-rag"
- :embedding-model "nomic-embed-text")))
+ ;; gemma 2 works very good in my use cases
+ ;; it also boasts strong multilingual capabilities
+ (setopt elisa-chat-provider
+ (make-llm-ollama
+ :chat-model "gemma2:9b-instruct-q6_K"
+ :embedding-model "chatfire/bge-m3:q8_0"
+ ;; set context window to 8k
+ :default-chat-non-standard-params '(("num_ctx" . 8192))))
+ ;; this embedding model has stong multilingual capabilities
+ (setopt elisa-embeddings-provider (make-llm-ollama :embedding-model
"chatfire/bge-m3:q8_0"))
+ :config
+ ;; searxng works better than duckduckgo in my tests
+ (setopt elisa-web-search-function 'elisa-search-searxng))
#+end_src
-The following variables can be customized for ELISA:
-- ~elisa-embeddings-provider~: Embeddings provider to generate
- embeddings.
-- ~elisa-chat-provider~: Chat provider.
-- ~elisa-db-directory~: Directory for elisa database.
-- ~elisa-limit~: Count info nodes to pass into llm context for answer.
-- ~elisa-find-executable~: Path to find executable.
-- ~elisa-tar-executable~: Path to tar executable.
-- ~elisa-sqlite-vss-version~: Sqlite VSS version.
+*** ELISA Custom Variables
+
+**** General Settings
+
++ *~elisa-embeddings-provider~*:
+ * Description: LLM provider to generate embeddings for text.
+ * Default: ~(make-llm-ollama :embedding-model "nomic-embed-text")~
+
++ *~elisa-chat-provider~*:
+ * Description: LLM provider used for chat interactions.
+ * Default: ~(make-llm-ollama :chat-model "sskostyaev/openchat:8k-rag"
:embedding-model
+ "nomic-embed-text")~
+
++ *~elisa-db-directory~*:
+ * Type: Directory
+ * Description: Specifies the directory where ELISA stores its database.
+ * Default: ~(file-name-concat user-emacs-directory "elisa")~ (within your
Emacs config
+ directory)
+
++ *~elisa-limit~*:
+ * Type: Integer
+ * Description: Controls the number of quotes passed to the LLM context for
generating an
+ answer.
+ * Default: 5
+
++ *~elisa-find-executable~*:
+ * Type: String
+ * Description: Path to the ~find~ command executable. Used for locating
files.
+ * Default: ~"find"~
+
+**** File System and Database Management
+
++ *~elisa-tar-executable~*:
+ * Type: String
+ * Description: Path to the ~tar~ command executable. Used for archiving
files.
+ * Default: ~"tar"~
+
++ *~elisa-sqlite-vss-version~*:
+ * Type: String
+ * Description: Version of the SQLite VSS extension.
+
++ *~elisa-sqlite-vss-path~*:
+ * Type: File path
+ * Description: Path to the SQLite VSS extension file.
+
++ *~elisa-sqlite-vector-path~*:
+ * Type: File path
+ * Description: Path to the SQLite Vector extension file.
+
+**** Text Processing and Semantic Splitting
+
++ *~elisa-semantic-split-function~*:
+ * Type: Function
+ * Description: Function used to split text into semantically meaningful
chunks.
+ * Default: ~elisa-split-by-paragraph~
+
++ *~elisa-prompt-rewriting-enabled~*:
+ * Type: Boolean
+ * Description: Enables or disables prompt rewriting for better retrieving.
+ * Default: ~t~ (enabled)
+
++ *~elisa-chat-prompt-template~*:
+ * Type: String
+ * Description: Template used for constructing the chat prompt.
+
++ *~elisa-rewrite-prompt-template~*:
+ * Type: String
+ * Description: Template used for rewriting prompts for better retrieval.
+
+**** Web Search and Integration
+
++ *~elisa-searxng-url~*:
+ * Type: String
+ * Description: URL of your SearXNG instance.
+ * Default: ~"http://localhost:8080/"~
+
++ *~elisa-pandoc-executable~*:
+ * Type: String
+ * Description: Path to the ~pandoc~ command executable. Used for
converting documents to text.
+ * Default: ~"pandoc"~
+
++ *~elisa-webpage-extraction-function~*:
+ * Type: Function
+ * Description: Function used to extract the content from a webpage.
+ * Default: ~elisa-get-webpage-buffer~
+
++ *~elisa-web-search-function~*:
+ * Type: Function
+ * Description: Function responsible for performing web searches using the
provided prompt.
+ * Default: ~elisa-search-duckduckgo~
+
++ *~elisa-web-pages-limit~*:
+ * Type: Integer
+ * Description: Maximum number of web pages to parse during a search.
+ * Default: 10
+
+**** Reranking
+
++ *~elisa-breakpoint-threshold-amount~*:
+ * Type: Float
+ * Description: Threshold used for controlling the granularity of semantic
splitting.
+ * Default: 0.4
+
++ *~elisa-reranker-enabled~*:
+ * Type: Boolean
+ * Description: Enables or disables reranking, which can improve retrieval
quality by ranking
+ retrieved quotes based on relevance.
+ * Default: ~nil~ (not set)
+
++ *~elisa-reranker-url~*:
+ * Type: String
+ * Description: URL of the reranking service.
+ * Default: ~"http://127.0.0.1:8787/"~
+
++ *~elisa-reranker-similarity-threshold~*:
+ * Type: Float
+ * Description: Similarity threshold for reranking. Quotes below this
threshold will be filtered
+ out. If not set all ~ellama-limit~ quotes will be added to context.
+ * Default: 0
+
++ *~elisa-reranker-limit~*:
+ * Type: Integer
+ * Description: Number of quotes to send to the reranker.
+ * Default: 20
+
+**** File Parsing and Exclusion
+
++ *~elisa-ignore-patterns-files~*:
+ * Type: List of strings
+ * Description: List of file name patterns (e.g., ~.gitignore~) used to
ignore files during
+ parsing.
+ * Default: ~(".gitignore" ".ignore" ".rgignore")~
+
++ *~elisa-ignore-invisible-files~*:
+ * Type: Boolean
+ * Description: Toggles whether invisible files and directories should be
ignored during
+ parsing.
+ * Default: ~t~ (true)
+
+**** ELISA Chat Collections
+
++ *~elisa-enabled-collections~*:
+ * Type: List of strings
+ * Description: Specifies which collections are enabled for chat
interactions.
+ * Default: ~("builtin manuals" "external manuals")~
** Contributions
diff --git a/elisa.el b/elisa.el
index 3869406c81..09cc28975d 100644
--- a/elisa.el
+++ b/elisa.el
@@ -5,8 +5,8 @@
;; Author: Sergey Kostyaev <sskostyaev@gmail.com>
;; URL: http://github.com/s-kostyaev/elisa
;; Keywords: help local tools
-;; Package-Requires: ((emacs "29.2") (ellama "0.8.6") (llm "0.9.1") (async
"1.9.8"))
-;; Version: 0.1.4
+;; Package-Requires: ((emacs "29.2") (ellama "0.11.2") (llm "0.9.1") (async
"1.9.8") (plz "0.9"))
+;; Version: 1.0.0
;; SPDX-License-Identifier: GPL-3.0-or-later
;; Created: 18th Feb 2024
@@ -25,22 +25,9 @@
;;; Commentary:
;;
-;; ELISA (Emacs Lisp Information System Assistant) is a project
-;; designed to help Emacs users quickly find answers to their
-;; questions related to Emacs and Emacs Lisp. Utilizing the powerful
-;; Ellama package, ELISA provides accurate and relevant responses to
-;; user queries, enhancing productivity and efficiency in the Emacs
-;; environment. By integrating links to the Emacs info manual after
-;; answering a question, ELISA ensures that users have easy access to
-;; additional information on the topic, making it an essential tool
-;; for both beginners and advanced Emacs users.
-;;
-;; ELISA creates index from info manuals. When you send message to
-;; `elisa-chat' it search to semantically similar info nodes in index,
-;; get first `elisa-limit' nodes, add it to context and send your
-;; message to llm. LLM generates answer to your message based on
-;; provided context. You can read not only answer generated by llm,
-;; but also info manuals by provided links.
+;; ELISA (Emacs Lisp Information System Assistant) is a system designed
+;; to provide informative answers to user queries by leveraging a
+;; Retrieval Augmented Generation (RAG) approach.
;;
;;; Code:
@@ -48,12 +35,20 @@
(require 'llm)
(require 'info)
(require 'async)
+(require 'dom)
+(require 'shr)
+(require 'plz)
+(require 'json)
+
+(defgroup elisa nil
+ "RAG implementation for `ellama'."
+ :group 'tools)
(defcustom elisa-embeddings-provider (progn (require 'llm-ollama)
(make-llm-ollama
:embedding-model
"nomic-embed-text"))
"Embeddings provider to generate embeddings."
- :group 'tools
+ :group 'elisa
:type '(sexp :validate 'cl-struct-p))
(defcustom elisa-chat-provider (progn (require 'llm-ollama)
@@ -61,36 +56,151 @@
:chat-model "sskostyaev/openchat:8k-rag"
:embedding-model "nomic-embed-text"))
"Chat provider."
- :group 'tools
+ :group 'elisa
:type '(sexp :validate 'cl-struct-p))
(defcustom elisa-db-directory (file-truename
(file-name-concat
user-emacs-directory "elisa"))
"Directory for elisa database."
- :group 'tools
+ :group 'elisa
:type 'directory)
(defcustom elisa-limit 5
- "Count info nodes to pass into llm context for answer."
- :group 'tools
+ "Count quotes to pass into llm context for answer."
+ :group 'elisa
:type 'integer)
-(defcustom elisa-find-executable (executable-find "find")
+(defcustom elisa-find-executable "find"
"Path to find executable."
- :group 'tools
- :type 'integer)
+ :group 'elisa
+ :type 'string)
-(defcustom elisa-tar-executable (executable-find "tar")
+(defcustom elisa-tar-executable "tar"
"Path to tar executable."
- :group 'tools
- :type 'integer)
+ :group 'elisa
+ :type 'string)
(defcustom elisa-sqlite-vss-version "v0.1.2"
"Sqlite VSS version."
- :group 'tools
+ :group 'elisa
+ :type 'string)
+
+(defcustom elisa-sqlite-vss-path nil
+ "Path to sqlite-vss extension."
+ :group 'elisa
+ :type 'file)
+
+(defcustom elisa-sqlite-vector-path nil
+ "Path to sqlite-vector extension."
+ :group 'elisa
+ :type 'file)
+
+(defcustom elisa-semantic-split-function 'elisa-split-by-paragraph
+ "Function for semantic text split."
+ :group 'elisa
+ :type 'function)
+
+(defcustom elisa-prompt-rewriting-enabled t
+ "Enable prompt rewriting for better retrieving."
+ :group 'elisa
+ :type 'boolean)
+
+(defcustom elisa-chat-prompt-template "Answer user query based on context
above. If you can answer it partially do it. Provide list of open questions if
any. Say \"not enough data\" if you can't answer user query based on provided
context. User query:
+%s"
+ "Chat prompt template."
+ :group 'elisa
+ :type 'string)
+
+(defcustom elisa-rewrite-prompt-template
+ "You are professional search agent. With given context and user
+prompt you need to create new prompt for search. It should be
+concise and useful without additional context. Response with
+prompt only. You should replace all words like 'this' or 'it' to
+its values to make search successful. If user prompt contains
+question your prompt should also be in form of question. For
+example:
+
+- What is pony?
+- Pony is ...
+- How to buy it?
+
+How to buy a pony?
+
+ User prompt:
+%s"
+ "Prompt template for prompt rewriting."
+ :group 'elisa
+ :type 'string)
+
+(defcustom elisa-searxng-url "http://localhost:8080/"
+ "Searxng url for web search. Json format should be enabled for this
instance."
+ :group 'elisa
+ :type 'string)
+
+(defcustom elisa-pandoc-executable "pandoc"
+ "Path to pandoc executable."
+ :group 'elisa
+ :type 'string)
+
+(defcustom elisa-webpage-extraction-function 'elisa-get-webpage-buffer
+ "Function to get buffer with webpage content."
+ :group 'elisa
+ :type 'function)
+
+(defcustom elisa-web-search-function 'elisa-search-duckduckgo
+ "Function to search the web.
+Function should get prompt and return list of urls."
+ :group 'elisa
+ :type 'function)
+
+(defcustom elisa-web-pages-limit 10
+ "Limit of web pages to parse during web search."
+ :group 'elisa
+ :type 'integer)
+
+(defcustom elisa-breakpoint-threshold-amount 0.4
+ "Breakpoint threshold amount.
+Increase it if you need decrease semantic split granularity."
+ :group 'elisa
+ :type 'float)
+
+(defcustom elisa-reranker-enabled nil
+ "Enable reranker to improve retrieving quality."
+ :group 'elisa
+ :type 'boolean)
+
+(defcustom elisa-reranker-url "http://127.0.0.1:8787/"
+ "Reranker service url."
+ :group 'elisa
+ :type 'string)
+
+(defcustom elisa-reranker-similarity-threshold 0
+ "Reranker similarity threshold.
+If set, all quotes with similarity less than threshold will be filtered out."
+ :group 'elisa
:type 'string)
+(defcustom elisa-reranker-limit 20
+ "Number of quotes for send to reranker."
+ :group 'elisa
+ :type 'integer)
+
+(defcustom elisa-ignore-patterns-files '(".gitignore" ".ignore" ".rgignore")
+ "Files with patterns to ignore during file parsing."
+ :group 'elisa
+ :type '(list string))
+
+(defcustom elisa-ignore-invisible-files t
+ "Ignore invisible files and directories during file parsing."
+ :group 'elisa
+ :type 'boolean)
+
+(defcustom elisa-enabled-collections '("builtin manuals" "external manuals")
+ "Enabled collections for elisa chat."
+ :group 'elisa
+ :type '(list string))
+
(defun elisa-sqlite-vss-download-url ()
"Generate sqlite vss download url based on current system."
(cond ((string-equal system-type "darwin")
@@ -112,19 +222,21 @@
(defun elisa--vss-path ()
"Path to vss sqlite extension."
- (let* ((ext (if (string-equal system-type "darwin")
- "dylib"
- "so"))
- (file (format "vss0.%s" ext)))
- (file-name-concat elisa-db-directory file)))
+ (or elisa-sqlite-vss-path
+ (let* ((ext (if (string-equal system-type "darwin")
+ "dylib"
+ "so"))
+ (file (format "vss0.%s" ext)))
+ (file-name-concat elisa-db-directory file))))
(defun elisa--vector-path ()
"Path to vector sqlite extension."
- (let* ((ext (if (string-equal system-type "darwin")
- "dylib"
- "so"))
- (file (format "vector0.%s" ext)))
- (file-name-concat elisa-db-directory file)))
+ (or elisa-sqlite-vector-path
+ (let* ((ext (if (string-equal system-type "darwin")
+ "dylib"
+ "so"))
+ (file (format "vector0.%s" ext)))
+ (file-name-concat elisa-db-directory file))))
;;;###autoload
(defun elisa-download-sqlite-vss ()
@@ -138,7 +250,7 @@
(default-directory elisa-db-directory))
(make-directory elisa-db-directory t)
(url-copy-file (elisa-sqlite-vss-download-url) file-name t)
- (process-lines elisa-tar-executable "-xf" file-name)
+ (process-lines (executable-find elisa-tar-executable) "-xf" file-name)
(delete-file file-name))
(elisa--reopen-db))
@@ -148,17 +260,54 @@
(defun elisa-embeddings-create-table-sql ()
"Generate sql for create embeddings table."
- (format "create virtual table if not exists elisa_embeddings using
vss0(embedding(%d));"
+ "drop table if exists elisa_embeddings;")
+
+(defun elisa-data-embeddings-create-table-sql ()
+ "Generate sql for create data embeddings table."
+ (format "create virtual table if not exists data_embeddings using
vss0(embedding(%d));"
(elisa-get-embedding-size)))
+(defun elisa-data-fts-create-table-sql ()
+ "Generate sql for create full text search table."
+ "create virtual table if not exists data_fts using fts5(data);")
+
(defun elisa-info-create-table-sql ()
"Generate sql for create info table."
- "create table if not exists info (node text unique);")
+ "drop table if exists info;")
+
+(defun elisa-collections-create-table-sql ()
+ "Generate sql for create collections table."
+ "create table if not exists collections (name text unique);")
+
+(defun elisa-kinds-create-table-sql ()
+ "Generate sql for create kinds table."
+ "create table if not exists kinds (name text unique);")
+
+(defun elisa-fill-kinds-sql ()
+ "Generate sql for fill kinds table."
+ "insert into kinds (name) values ('web'), ('file'), ('info') on conflict do
nothing;")
+
+(defun elisa-files-create-table-sql ()
+ "Generate sql for create files table."
+ "create table if not exists files (path text unique, hash text)")
+
+(defun elisa-data-create-table-sql ()
+ "Generate sql for create data table."
+ "create table if not exists data (
+kind_id INTEGER,
+collection_id INTEGER,
+path text,
+hash text,
+data text,
+FOREIGN KEY(kind_id) REFERENCES kinds(rowid),
+FOREIGN KEY(collection_id) REFERENCES collections(rowid)
+);")
(defun elisa--init-db (db)
"Initialize elisa DB."
(if (not (file-exists-p (elisa--vss-path)))
(warn "Please run M-x `elisa-download-sqlite-vss' to use this package")
+ (sqlite-pragma db "PRAGMA journal_mode=WAL;")
(sqlite-load-extension
db
(elisa--vector-path))
@@ -166,7 +315,14 @@
db
(elisa--vss-path))
(sqlite-execute db (elisa-embeddings-create-table-sql))
- (sqlite-execute db (elisa-info-create-table-sql))))
+ (sqlite-execute db (elisa-info-create-table-sql))
+ (sqlite-execute db (elisa-collections-create-table-sql))
+ (sqlite-execute db (elisa-kinds-create-table-sql))
+ (sqlite-execute db (elisa-fill-kinds-sql))
+ (sqlite-execute db (elisa-files-create-table-sql))
+ (sqlite-execute db (elisa-data-create-table-sql))
+ (sqlite-execute db (elisa-data-embeddings-create-table-sql))
+ (sqlite-execute db (elisa-data-fts-create-table-sql))))
(defvar elisa-db (progn
(make-directory elisa-db-directory t)
@@ -181,56 +337,759 @@
(defun elisa-sqlite-escape (s)
"Escape single quotes in S for sqlite."
- (string-replace "'" "''" s))
+ (thread-last
+ s
+ (string-replace "'" "''")
+ (string-replace "\\" "\\\\")
+ (string-replace "\0" "\n")))
+
+(defun elisa-sqlite-format-int-list (ids)
+ "Convert list of integer IDS list to sqlite list representation."
+ (format
+ "(%s)"
+ (string-join (mapcar (lambda (id) (format "%d" id)) ids) ", ")))
+
+(defun elisa-sqlite-format-string-list (names)
+ "Convert list of string NAMES list to sqlite list representation."
+ (format
+ "(%s)"
+ (string-join (mapcar (lambda (name)
+ (format "'%s'"
+ (elisa-sqlite-escape name))) names) ", ")))
-(defun elisa-parse-info-manual (name)
- "Parse info manual with NAME and save index to database."
+(defun elisa-avg (lst)
+ "Calculate arithmetic average value of LST."
+ (let ((len (length lst))
+ (sum (cl-reduce #'+ lst :initial-value 0.0)))
+ (/ sum len)))
+
+(defun elisa-std-dev (lst)
+ "Calculate standart deviation value of LST."
+ (let ((avg (elisa-avg lst))
+ (len (length lst)))
+ (sqrt (/ (cl-reduce
+ #'+
+ (mapcar
+ (lambda (x) (expt (- x avg) 2))
+ lst))
+ len))))
+
+(defun elisa-calculate-threshold (k distances)
+ "Calculate breakpoint threshold for DISTANCES based on K standard
deviations."
+ (+ (elisa-avg distances) (* k (elisa-std-dev distances))))
+
+(defun elisa-parse-info-manual (name collection-name)
+ "Parse info manual with NAME and save index to COLLECTION-NAME."
(with-temp-buffer
- (info name (current-buffer))
- (let ((continue t))
- (while continue
- (let* ((node-name (concat "(" (file-name-sans-extension
- (file-name-nondirectory
Info-current-file))
- ") "
- Info-current-node))
- (content (buffer-substring-no-properties (point-min)
(point-max)))
- (embedding (llm-embedding elisa-embeddings-provider content))
- (rowid (progn
- (sqlite-execute elisa-db
+ (ignore-errors
+ (info name (current-buffer))
+ (let ((collection-id (or (caar (sqlite-select
+ elisa-db
+ (format
+ "select rowid from collections where
name = '%s';"
+ collection-name)))
+ (progn
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into collections (name) values
('%s');"
+ collection-name))
+ (caar (sqlite-select
+ elisa-db
(format
- "insert into info values('%s') on
conflict do nothing;"
- (elisa-sqlite-escape node-name)))
- (caar
- (sqlite-select
+ "select rowid from collections where
name = '%s';"
+ collection-name))))))
+ (kind-id (caar (sqlite-select
+ elisa-db "select rowid from kinds where name =
'info';")))
+ (continue t)
+ (parsed-nodes nil))
+ (while continue
+ (let* ((node-name (concat "(" (file-name-sans-extension
+ (file-name-nondirectory
Info-current-file))
+ ") "
+ Info-current-node))
+ (chunks (elisa-split-semantically)))
+ (if (not (cl-find node-name parsed-nodes :test 'string-equal))
+ (progn
+ (mapc
+ (lambda (text)
+ (let* ((hash (secure-hash 'sha256 text))
+ (embedding (llm-embedding elisa-embeddings-provider
text))
+ (rowid
+ (if-let ((rowid (caar (sqlite-select
+ elisa-db
+ (format "select rowid from
data where kind_id = %s and collection_id = %s and path = '%s' and hash = '%s';"
+ kind-id
collection-id
+
(elisa-sqlite-escape node-name) hash)))))
+ nil
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into data(kind_id, collection_id,
path, hash, data) values (%s, %s, '%s', '%s', '%s');"
+ kind-id collection-id
+ (elisa-sqlite-escape node-name) hash
(elisa-sqlite-escape text)))
+ (caar (sqlite-select
+ elisa-db
+ (format "select rowid from data where
kind_id = %s and collection_id = %s and path = '%s' and hash = '%s';"
+ kind-id collection-id
+ (elisa-sqlite-escape node-name)
hash))))))
+ (when rowid
+ (sqlite-execute
elisa-db
- (format "select rowid from info where node='%s';"
- (elisa-sqlite-escape node-name)))))))
- (when (not (caar
- (sqlite-select
- elisa-db
- (format "select rowid from elisa_embeddings where
rowid=%s;" rowid))))
+ (format "insert into data_embeddings(rowid,
embedding) values (%s, %s);"
+ rowid (elisa-vector-to-sqlite embedding)))
+ (sqlite-execute
+ elisa-db
+ (format "insert into data_fts(rowid, data) values
(%s, '%s');"
+ rowid (elisa-sqlite-escape text))))))
+ chunks)
+ (push node-name parsed-nodes)
+ (condition-case nil
+ (funcall-interactively #'Info-forward-node)
+ (error
+ (setq continue nil))))
+ (setq continue nil))))))))
+
+(defun elisa--find-similar (text collections)
+ "Find similar to TEXT results in COLLECTIONS.
+Return sqlite query. For asyncronous execution."
+ (let* ((rowids (flatten-tree
+ (sqlite-select
+ elisa-db
+ (format "select rowid from data where collection_id in
+ (
+SELECT rowid FROM collections WHERE name IN %s
+);"
+ (elisa-sqlite-format-string-list collections)))))
+ (query (format "WITH
+vector_search AS (
+ SELECT rowid, distance
+ FROM data_embeddings
+ WHERE vss_search(embedding, %s)
+ ORDER BY distance ASC
+ LIMIT 40
+),
+semantic_search AS (
+ SELECT rowid, RANK () OVER (ORDER BY distance ASC) AS rank
+ FROM vector_search
+ WHERE rowid IN %s
+ ORDER BY distance ASC
+ LIMIT 20
+),
+keyword_search AS (
+ SELECT rowid, RANK () OVER (ORDER BY bm25(data_fts) ASC) AS rank
+ FROM data_fts
+ WHERE rowid in %s and data_fts MATCH '%s'
+ ORDER BY bm25(data_fts) ASC
+ LIMIT 20
+),
+hybrid_search AS (
+SELECT
+ COALESCE(semantic_search.rowid, keyword_search.rowid) AS rowid,
+ COALESCE(1.0 / (60 + semantic_search.rank), 0.0) +
+ COALESCE(1.0 / (60 + keyword_search.rank), 0.0) AS score
+FROM semantic_search
+FULL OUTER JOIN keyword_search ON semantic_search.rowid = keyword_search.rowid
+ORDER BY score DESC
+LIMIT %d
+)
+SELECT
+ hybrid_search.rowid
+FROM hybrid_search
+;
+"
+ (elisa-vector-to-sqlite
+ (llm-embedding elisa-embeddings-provider text))
+ (elisa-sqlite-format-int-list rowids)
+ (elisa-sqlite-format-int-list rowids)
+ (elisa-fts-query text)
+ (elisa-get-limit))))
+ query))
+
+(defun elisa-find-similar (text collections on-done)
+ "Find similar to TEXT results in COLLECTIONS.
+Evaluate ON-DONE with result."
+ (message "searching in collected data")
+ (elisa--async-do
+ (lambda () (elisa--find-similar text collections))
+ on-done))
+
+(defun elisa--split-by (func)
+ "Split buffer content to list by FUNC."
+ (let ((pt (point-min))
+ (result nil))
+ (save-excursion
+ (goto-char (point-min))
+ (while (< (point) (point-max))
+ (funcall func)
+ (push (buffer-substring-no-properties pt (point)) result)
+ (setq pt (point)))
+ (nreverse (cl-remove-if #'string-empty-p result)))))
+
+(defun elisa-split-by-sentence ()
+ "Split byffer to list of sentences."
+ (elisa--split-by #'forward-sentence))
+
+(defun elisa-split-by-paragraph ()
+ "Split buffer to list of paragraphs."
+ (elisa--split-by #'forward-paragraph))
+
+(defun elisa-dot-product (v1 v2)
+ "Calculate the dot produce of vectors V1 and V2."
+ (let ((result 0))
+ (dotimes (i (length v1))
+ (setq result (+ result (* (aref v1 i) (aref v2 i)))))
+ result))
+
+(defun elisa-magnitude (v)
+ "Calculate magnitude of vector V."
+ (let ((sum 0))
+ (dotimes (i (length v))
+ (setq sum (+ sum (* (aref v i) (aref v i)))))
+ (sqrt sum)))
+
+(defun elisa-cosine-similarity (v1 v2)
+ "Calculate the cosine similarity of V1 and V2.
+The return is a floating point number between 0 and 1, where the
+closer it is to 1, the more similar it is."
+ (let ((dot-product (elisa-dot-product v1 v2))
+ (v1-magnitude (elisa-magnitude v1))
+ (v2-magnitude (elisa-magnitude v2)))
+ (if (and v1-magnitude v2-magnitude)
+ (/ dot-product (* v1-magnitude v2-magnitude))
+ 0)))
+
+(defun elisa-cosine-distance (v1 v2)
+ "Calculate cosine-distance between V1 and V2."
+ (- 1 (elisa-cosine-similarity v1 v2)))
+
+(defun elisa--similarities (list)
+ "Calculate cosine similarities between neighbour elements in LIST."
+ (let ((head (car list))
+ (tail (cdr list))
+ (result nil))
+ (while tail
+ (push (elisa-cosine-similarity head (car tail)) result)
+ (setq head (car tail))
+ (setq tail (cdr tail)))
+ (nreverse result)))
+
+(defun elisa--distances (list)
+ "Calculate cosine distances between neighbour elements in LIST."
+ (let ((head (car list))
+ (tail (cdr list))
+ (result nil))
+ (while tail
+ (push (elisa-cosine-distance head (car tail)) result)
+ (setq head (car tail))
+ (setq tail (cdr tail)))
+ (nreverse result)))
+
+(defun elisa-split-semantically (&rest args)
+ "Split buffer data semantically.
+ARGS contains keys for fine control.
+
+:function FUNC -- FUNC is a function for split buffer into chunks.
+
+:threshold-amount K -- K is a breakpoint threshold amount.
+
+than T, it will be packed into single semantic chunk."
+ (if-let* ((func (or (plist-get args :function)
elisa-semantic-split-function))
+ (k (or (plist-get args :threshold-amount)
elisa-breakpoint-threshold-amount))
+ (chunks (funcall func))
+ (embeddings (cl-remove-if
+ #'not
+ (mapcar (lambda (s)
+ (when (length> (string-trim s) 0)
+ (llm-embedding elisa-embeddings-provider
s)))
+ chunks)))
+ (distances (elisa--distances embeddings))
+ (threshold (elisa-calculate-threshold k distances))
+ (current (car chunks))
+ (tail (cdr chunks)))
+ (let* ((result nil))
+ (mapc
+ (lambda (el)
+ (if (<= el threshold)
+ (setq current (concat current (car tail)))
+ (push current result)
+ (setq current (car tail)))
+ (setq tail (cdr tail)))
+ distances)
+ (push current result)
+ (cl-remove-if
+ #'string-empty-p
+ (mapcar (lambda (s)
+ (if s
+ (string-trim s)
+ ""))
+ (nreverse result))))
+ (list (buffer-substring-no-properties (point-min) (point-max)))))
+
+(defun elisa--gitignore-to-elisp-regexp (pattern)
+ "Convert a .gitignore PATTERN to an Emacs Lisp regexp."
+ (let ((result "")
+ (i 0)
+ (len (length pattern)))
+ (while (< i len)
+ (let ((char (aref pattern i)))
+ (cond
+ ;; Escape special regex characters
+ ((string-match-p "[.?+*^$(){}\\[\\]\\\\]" (char-to-string char))
+ (setq result (concat result "\\" (char-to-string char))))
+ ;; Handle ** for any number of directories
+ ((and (> len (+ i 1))
+ (char-equal char ?*)
+ (char-equal (aref pattern (+ i 1)) ?*))
+ (setq result (concat result ".*"))
+ (setq i (+ i 1)))
+ ;; Handle * for any number of characters except /
+ ((char-equal char ?*)
+ (setq result (concat result "[^/]*")))
+ ;; Handle ? for a single character except /
+ ((char-equal char ??)
+ (setq result (concat result "[^/]")))
+ ;; Handle negation
+ ((char-equal char ?!)
+ (setq result (concat result "^")))
+ ;; Handle directory separator
+ ((char-equal char ?/)
+ (setq result (concat result "/")))
+ ;; Default case: add the character as is
+ (t
+ (setq result (concat result (char-to-string char))))))
+ (setq i (+ i 1)))
+ ;; prevent false-positive partial matches
+ (concat result "$")))
+
+(defun elisa--read-ignore-file-regexps (directory)
+ "Read ignore patterns from `elisa-ignore-patterns-files' in DIRECTORY."
+ (mapcar #'elisa--gitignore-to-elisp-regexp
+ (flatten-tree
+ (mapcar (lambda (file)
+ (let ((filepath (expand-file-name file directory)))
+ (when (file-exists-p filepath)
+ (with-temp-buffer
+ (insert-file-contents filepath)
+ (split-string (buffer-string) "\n" t)))))
+ elisa-ignore-patterns-files))))
+
+(defun elisa--text-file-p (filename)
+ "Check if FILENAME contain text."
+ (or (when (get-file-buffer filename) t) ;; if file opened assume it text
+ (with-current-buffer (find-file-noselect filename t t)
+ (prog1
+ ;; if there is null byte in file, file is binary
+ (not (re-search-forward "\0" nil t 1))
+ (kill-buffer)))))
+
+(defun elisa--file-list (directory)
+ "List of files to parse in DIRECTORY."
+ (let ((ignore-regexps (elisa--read-ignore-file-regexps directory)))
+ (when elisa-ignore-invisible-files
+ (push "$\\.[^/]*" ignore-regexps)
+ (push "/\\.[^/]*" ignore-regexps))
+ (seq-filter (lambda (file)
+ (and (not (seq-some (lambda (regexp)
+ (string-match-p regexp file))
+ ignore-regexps))
+ (elisa--text-file-p file)))
+ (directory-files-recursively directory ".*"))))
+
+(defun elisa-parse-file (collection-id path &optional force)
+ "Parse file PATH for COLLECTION-ID.
+When FORCE parse even if already parsed."
+ (let* ((opened (get-file-buffer path))
+ (buf (or opened (find-file-noselect path t t)))
+ (hash (secure-hash 'sha256 buf))
+ (prev-hash (caar (sqlite-select
+ elisa-db
+ (format "select hash from files where path = '%s';"
+ (elisa-sqlite-escape path))))))
+ (when (or force
+ (not prev-hash)
+ (not (string-equal hash prev-hash)))
+ (with-current-buffer buf
+ (let ((chunks (elisa-split-semantically))
+ (old-row-ids
+ (flatten-tree (sqlite-select
+ elisa-db
+ (format "select rowid from data where path =
'%s';"
+ (elisa-sqlite-escape path)))))
+ (row-ids nil)
+ (kind-id (caar (sqlite-select
+ elisa-db
+ "select rowid from kinds where name = 'file';"))))
+ ;; remove old data
+ (when prev-hash
(sqlite-execute
elisa-db
- (format "insert into elisa_embeddings(rowid, embedding) values
(%s, %s);"
- rowid
- (elisa-vector-to-sqlite embedding))))
- (condition-case nil
- (progn (funcall-interactively #'Info-forward-node)
- (sleep-for 0 100))
- (error
- (setq continue nil))))))))
-
-(defun elisa-find-similar (text)
- "Find similar to TEXT results."
- (let ((embedding (llm-embedding elisa-embeddings-provider text)))
- (flatten-tree
- (sqlite-select
- elisa-db
- (format
- "select * from info where rowid in
-(select rowid from elisa_embeddings where vss_search(embedding,%s) limit %d);"
- (elisa-vector-to-sqlite embedding)
- elisa-limit)))))
+ (format "delete from files where path = '%s';"
+ (elisa-sqlite-escape path))))
+ ;; add new data
+ (mapc
+ (lambda (text)
+ (let* ((hash (secure-hash 'sha256 text))
+ (rowid
+ (if-let ((rowid (caar (sqlite-select
+ elisa-db
+ (format "select rowid from data
where kind_id = %s and collection_id = %s and path = '%s' and hash = '%s';"
+ kind-id collection-id
+ (elisa-sqlite-escape path)
hash)))))
+ (progn
+ (push rowid row-ids)
+ nil)
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into data(kind_id, collection_id, path, hash,
data) values (%s, %s, '%s', '%s', '%s');"
+ kind-id collection-id
+ (elisa-sqlite-escape path) hash (elisa-sqlite-escape
text)))
+ (caar (sqlite-select
+ elisa-db
+ (format "select rowid from data where kind_id =
%s and collection_id = %s and path = '%s' and hash = '%s';"
+ kind-id collection-id
+ (elisa-sqlite-escape path) hash))))))
+ (when rowid
+ (sqlite-execute
+ elisa-db
+ (format "insert into data_embeddings(rowid, embedding) values
(%s, %s);"
+ rowid (elisa-vector-to-sqlite
+ (llm-embedding elisa-embeddings-provider
text))))
+ (sqlite-execute
+ elisa-db
+ (format "insert into data_fts(rowid, data) values (%s, '%s');"
+ rowid (elisa-sqlite-escape text)))
+ (push rowid row-ids))))
+ chunks)
+ ;; remove old data
+ (when row-ids
+ (let ((delete-rows (cl-remove-if (lambda (id)
+ (cl-find id row-ids))
+ old-row-ids)))
+ (elisa--delete-data delete-rows)))
+ ;; save hash to files table
+ (sqlite-execute
+ elisa-db
+ (format "insert into files (path, hash) values ('%s', '%s');"
+ (elisa-sqlite-escape path) hash)))))
+ ;; kill buffer if it was not open before parsing
+ (when (not opened)
+ (kill-buffer buf))))
+
+(defun elisa--delete-data (ids)
+ "Delete data with IDS."
+ (sqlite-execute
+ elisa-db
+ (format "delete from data_fts where rowid in %s;"
+ (elisa-sqlite-format-int-list ids)))
+ (sqlite-execute
+ elisa-db
+ (format "delete from data_embeddings where rowid in %s;"
+ (elisa-sqlite-format-int-list ids)))
+ (sqlite-execute
+ elisa-db
+ (format "delete from data where rowid in %s;"
+ (elisa-sqlite-format-int-list ids))))
+
+(defun elisa-parse-directory (dir)
+ "Parse DIR as new collection syncronously."
+ (setq dir (expand-file-name dir))
+ (let* ((collection-id (progn
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into collections (name) values ('%s') on
conflict do nothing;"
+ (elisa-sqlite-escape dir)))
+ (caar (sqlite-select
+ elisa-db
+ (format
+ "select rowid from collections where name =
'%s';"
+ (elisa-sqlite-escape dir))))))
+ (files (elisa--file-list dir))
+ (delete-ids (flatten-tree
+ (sqlite-select
+ elisa-db
+ (format
+ "select rowid from data where collection_id = %d and
path not in %s;"
+ collection-id
+ (elisa-sqlite-format-string-list files))))))
+ (elisa--delete-data delete-ids)
+ (mapc (lambda (file)
+ (message "parsing %s" file)
+ (elisa-parse-file collection-id file))
+ files)))
+
+;;;###autoload
+(defun elisa-async-parse-directory (dir)
+ "Parse DIR as new collection asyncronously."
+ (interactive "DSelect directory: ")
+ (elisa--async-do (lambda ()
+ (elisa-parse-directory
+ (expand-file-name dir)))))
+
+(defun elisa-search-duckduckgo (prompt)
+ "Search duckduckgo for PROMPT and return list of urls."
+ (let* ((url (format "https://duckduckgo.com/html/?q=%s" (url-hexify-string
prompt)))
+ (buffer-name (plz 'get url :as 'buffer
+ :headers `(("Accept" . ,eww-accept-content-types)
+ ("Accept-Encoding" . "gzip")
+ ("User-Agent" .
,(url-http--user-agent-default-string))))))
+ (with-current-buffer buffer-name
+ (goto-char (point-min))
+ (search-forward "<!DOCTYPE")
+ (beginning-of-line)
+ (cl-remove-if
+ #'string-empty-p
+ (cl-remove-duplicates
+ (mapcar
+ (lambda (el)
+ (when el
+ (string-trim-right
+ (url-unhex-string
+ (cdar (url-parse-args (or (dom-attr el 'href) ""))))
+ "[&\\?].*")))
+ (dom-by-tag
+ (libxml-parse-html-region
+ (point) (point-max))
+ 'a))
+ :test 'string-equal)))))
+
+(defun elisa-search-searxng (prompt)
+ "Search searxng for PROMPT and return list of urls.
+You can customize `elisa-searxng-url' to use non local instance."
+ (let ((url (format "%s/search?format=json&q=%s" elisa-searxng-url
(url-hexify-string prompt))))
+ (thread-last
+ (plz 'get url :as 'json-read)
+ (alist-get 'results)
+ (mapcar (lambda (el) (alist-get 'url el))))))
+
+(defun elisa-get-webpage-buffer (url)
+ "Get buffer with URL content."
+ (let ((buffer-name (ignore-errors
+ (plz 'get url :as 'buffer
+ :headers `(("Accept" . ,eww-accept-content-types)
+ ("Accept-Encoding" . "gzip")
+ ("User-Agent" .
,(url-http--user-agent-default-string))))))
+ ;; fix one word lines for async execution
+ (shr-use-fonts nil)
+ (shr-width (- ellama-long-lines-length 5)))
+ (when buffer-name
+ (with-current-buffer buffer-name
+ (goto-char (point-min))
+ (or (search-forward "<!DOCTYPE" nil t)
+ (search-forward "<html" nil t))
+ (beginning-of-line)
+ (kill-region (point-min) (point))
+ (ignore-errors
+ (shr-insert-document (libxml-parse-html-region (point-min)
(point-max))))
+ (goto-char (point-min))
+ (or (search-forward "<!DOCTYPE" nil t)
+ (search-forward "<html" nil t))
+ (beginning-of-line)
+ (kill-region (point) (point-max))
+ buffer-name))))
+
+(defun elisa-get-webpage-buffer-pandoc (url)
+ "Get buffer with URL content translated to markdown with pandoc."
+ (let ((buffer-name (plz 'get url :as 'buffer)))
+ (with-current-buffer buffer-name
+ (shell-command-on-region
+ (point-min) (point-max)
+ (format "%s -f html --to plain"
+ (executable-find elisa-pandoc-executable))
+ buffer-name t)
+ buffer-name)))
+
+(defun elisa-fts-query (prompt)
+ "Return fts match query for PROMPT."
+ (thread-last
+ prompt
+ (string-trim)
+ (downcase)
+ (string-replace "-" " ")
+ (replace-regexp-in-string "[^[:alnum:] ]+" "")
+ (string-trim)
+ (replace-regexp-in-string "[[:space:]]+" " OR ")))
+
+(defun elisa--rerank-request (prompt ids)
+ "Generate rerank request body for PROMPT and IDS."
+ (let ((docs
+ (mapcar
+ (lambda (row)
+ (let ((id (cl-first row))
+ (text (cl-second row)))
+ `(("id" . ,id) ("text" . ,text))))
+ (sqlite-select
+ elisa-db
+ (format
+ "select rowid, data from data where rowid in %s;"
+ (elisa-sqlite-format-int-list ids))))))
+ (json-encode `(("query" . ,prompt)
+ ("documents" . ,docs)))))
+
+(defun elisa--do-rerank-request (prompt ids)
+ "Call rerank service for PROMPT and IDS."
+ (when ids
+ (seq--into-list
+ (alist-get 'data
+ (plz 'post (format "%s/api/v1/rerank"
+ (string-remove-suffix "/"
elisa-reranker-url))
+ :headers `(("Content-Type" . "application/json"))
+ :body-type 'text
+ :body (elisa--rerank-request prompt ids)
+ :as #'json-read)))))
+
+(defun elisa-rerank (prompt ids)
+ "Rerank IDS according to PROMPT and return top `elisa-limit' IDS."
+ (let ((data (elisa--do-rerank-request prompt ids)))
+ (mapcar (lambda (elt)
+ (alist-get 'id elt))
+ (take elisa-limit
+ (if elisa-reranker-similarity-threshold
+ (cl-remove-if (lambda (obj)
+ (< (alist-get 'similarity obj)
+ elisa-reranker-similarity-threshold))
+ data)
+ data)))))
+
+(defun elisa-get-limit ()
+ "Limit for elisa hybrid search."
+ (if elisa-reranker-enabled
+ elisa-reranker-limit
+ elisa-limit))
+
+(defun elisa--parse-web-page (collection-id url)
+ "Parse URL into collection with COLLECTION-ID."
+ (let ((kind-id (caar (sqlite-select
+ elisa-db "select rowid from kinds where name =
'web';"))))
+ (message "collecting data from %s" url)
+ (mapc
+ (lambda (chunk)
+ (let* ((hash (secure-hash 'sha256 chunk))
+ (embedding (llm-embedding elisa-embeddings-provider chunk))
+ (rowid
+ (if-let ((rowid (caar (sqlite-select
+ elisa-db
+ (format "select rowid from data where
kind_id = %s and collection_id = %s and path = '%s' and hash = '%s';" kind-id
collection-id url hash)))))
+ nil
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into data(kind_id, collection_id, path, hash, data)
values (%s, %s, '%s', '%s', '%s');"
+ kind-id collection-id url hash (elisa-sqlite-escape chunk)))
+ (caar (sqlite-select
+ elisa-db
+ (format "select rowid from data where kind_id = %s and
collection_id = %s and path = '%s' and hash = '%s';" kind-id collection-id url
hash))))))
+ (when rowid
+ (sqlite-execute
+ elisa-db
+ (format "insert into data_embeddings(rowid, embedding) values (%s,
%s);"
+ rowid (elisa-vector-to-sqlite embedding)))
+ (sqlite-execute
+ elisa-db
+ (format "insert into data_fts(rowid, data) values (%s, '%s');"
+ rowid (elisa-sqlite-escape chunk))))))
+ (elisa-extact-webpage-chunks url))))
+
+(defun elisa--web-search (prompt)
+ "Search the web for PROMPT.
+Return sqlite query that extract data for adding to context."
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into collections (name) values ('%s') on conflict do nothing;"
+ (elisa-sqlite-escape prompt)))
+ (let* ((collection-id (caar (sqlite-select
+ elisa-db
+ (format
+ "select rowid from collections where name =
'%s';"
+ (elisa-sqlite-escape prompt)))))
+ (urls (funcall elisa-web-search-function prompt))
+ (collected-pages 0))
+ (mapc (lambda (url)
+ (when (<= collected-pages elisa-web-pages-limit)
+ (elisa--parse-web-page collection-id url)
+ (cl-incf collected-pages)))
+ urls)))
+
+(defun elisa--rewrite-prompt (prompt action)
+ "Rewrite PROMPT if `elisa-prompt-rewriting-enabled'.
+Call ACTION with new prompt."
+ (let ((session (and ellama--current-session-id
+ (with-current-buffer (ellama-get-session-buffer
+ ellama--current-session-id)
+ ellama--current-session))))
+ (if (and elisa-prompt-rewriting-enabled
+ ellama--current-session-id
+ (string= (llm-name (ellama-session-provider session))
+ (llm-name elisa-chat-provider)))
+ (with-current-buffer (get-buffer-create (make-temp-name "elisa"))
+ (ellama-stream
+ (format elisa-rewrite-prompt-template prompt)
+ :session session
+ :buffer (current-buffer)
+ :provider elisa-chat-provider
+ :on-done action))
+ (funcall action prompt))))
+
+;;;###autoload
+(defun elisa-web-search (prompt)
+ "Search the web for PROMPT."
+ (interactive "sAsk elisa with web search: ")
+ (elisa--rewrite-prompt prompt #'elisa--web-search-internal))
+
+(defun elisa--web-search-internal (prompt)
+ "Search the web for PROMPT."
+ (message "searching the web")
+ (elisa--async-do
+ (lambda () (elisa--web-search prompt))
+ (lambda (_)
+ (elisa-find-similar
+ prompt (list prompt)
+ (lambda (query) (elisa-retrieve-ask query prompt))))))
+
+(defun elisa-retrieve-ask (query prompt)
+ "Retrieve data with QUERY and ask elisa for PROMPT."
+ (elisa--async-do
+ (lambda () (let* ((raw-ids (flatten-tree (sqlite-select elisa-db query)))
+ (ids (if elisa-reranker-enabled
+ (elisa-rerank prompt raw-ids)
+ (take elisa-limit raw-ids))))
+ (when ids
+ (sqlite-select
+ elisa-db
+ (format
+ "SELECT k.name, d.path, d.data
+FROM data AS d
+LEFT JOIN kinds k ON k.rowid = d.kind_id
+WHERE d.rowid in %s;"
+ (elisa-sqlite-format-int-list ids))))))
+ (lambda (result)
+ (if result (mapc
+ (lambda (row)
+ (when-let ((kind (cl-first row))
+ (path (cl-second row))
+ (text (cl-third row)))
+ (pcase kind
+ ("web"
+ (ellama-context-add-webpage-quote-noninteractive path
path text))
+ ("file"
+ (ellama-context-add-file-quote-noninteractive path
text))
+ ("info"
+ (ellama-context-add-info-node-quote-noninteractive path
text)))))
+ result)
+ (ellama-context-add-text "No related documents found."))
+ (ellama-chat
+ (format elisa-chat-prompt-template prompt)
+ nil :provider elisa-chat-provider))))
+
+(defun elisa--info-valid-p (name)
+ "Return NAME if info is valid."
+ (with-temp-buffer
+ (ignore-errors
+ (info name (current-buffer))
+ name)))
(defun elisa-get-builtin-manuals ()
"Get builtin manual names list."
@@ -245,24 +1104,29 @@
(defun elisa-get-external-manuals ()
"Get external manual names list."
- (seq-uniq
+ (cl-remove-if
+ #'not
(mapcar
- #'file-name-base
- (process-lines
- elisa-find-executable
- (file-truename
- (file-name-concat user-emacs-directory "elpa")) "-name" "*.info"))))
+ #'elisa--info-valid-p
+ (seq-uniq
+ (mapcar
+ #'file-name-base
+ (process-lines
+ (executable-find elisa-find-executable)
+ (file-truename
+ (file-name-concat user-emacs-directory "elpa"))
+ "-name" "*.info"))))))
(defun elisa-parse-builtin-manuals ()
"Parse builtin manuals."
(mapc (lambda (s)
- (ignore-errors (elisa-parse-info-manual s)))
+ (elisa-parse-info-manual s "builtin manuals"))
(elisa-get-builtin-manuals)))
(defun elisa-parse-external-manuals ()
"Parse external manuals."
(mapc (lambda (s)
- (ignore-errors (elisa-parse-info-manual s)))
+ (elisa-parse-info-manual s "external manuals"))
(elisa-get-external-manuals)))
(defun elisa-parse-all-manuals ()
@@ -276,50 +1140,224 @@
(elisa--init-db db)
(setq elisa-db db)))
-(defun elisa--async-do-parse (func)
- "Parse asyncronously with FUNC."
- (async-start `(lambda ()
- ,(async-inject-variables "elisa-embeddings-provider")
- ,(async-inject-variables "elisa-db-directory")
- ,(async-inject-variables "elisa-find-executable")
- ,(async-inject-variables "elisa-tar-executable")
- ,(async-inject-variables "load-path")
- (require 'elisa)
- (,func))
- (lambda (_)
- (sqlite-close elisa-db)
- (elisa--reopen-db)
- (message "%s done."
- func))))
+(defun elisa--async-do (func &optional on-done)
+ "Do FUNC asyncronously.
+Call ON-DONE callback with result as an argument after FUNC evaluation done."
+ (let ((command real-this-command))
+ (async-start `(lambda ()
+ ,(async-inject-variables "elisa-embeddings-provider")
+ ,(async-inject-variables "elisa-db-directory")
+ ,(async-inject-variables "elisa-find-executable")
+ ,(async-inject-variables "elisa-tar-executable")
+ ,(async-inject-variables "elisa-prompt-rewriting-enabled")
+ ,(async-inject-variables "elisa-rewrite-prompt-template")
+ ,(async-inject-variables "elisa-semantic-split-function")
+ ,(async-inject-variables
"elisa-webpage-extraction-function")
+ ,(async-inject-variables "elisa-web-search-function")
+ ,(async-inject-variables "elisa-searxng-url")
+ ,(async-inject-variables "elisa-web-pages-limit")
+ ,(async-inject-variables
"elisa-breakpoint-threshold-amount")
+ ,(async-inject-variables "elisa-pandoc-executable")
+ ,(async-inject-variables "ellama-long-lines-length")
+ ,(async-inject-variables "elisa-reranker-enabled")
+ ,(async-inject-variables "load-path")
+ ,(async-inject-variables "Info-directory-list")
+ (require 'elisa)
+ (,func))
+ (lambda (res)
+ (sqlite-close elisa-db)
+ (elisa--reopen-db)
+ (when on-done
+ (funcall on-done res))
+ (message "%s done."
+ (or command "async elisa processing"))))))
+
+(defun elisa-extact-webpage-chunks (url)
+ "Extract semantic chunks for webpage fetched from URL."
+ (when-let ((buf (funcall elisa-webpage-extraction-function url)))
+ (with-current-buffer buf
+ (elisa-split-semantically))))
;;;###autoload
(defun elisa-async-parse-builtin-manuals ()
"Parse builtin manuals asyncronously."
(interactive)
(message "Begin parsing builtin manuals.")
- (elisa--async-do-parse 'elisa-parse-builtin-manuals))
+ (elisa--async-do 'elisa-parse-builtin-manuals))
;;;###autoload
(defun elisa-async-parse-external-manuals ()
"Parse external manuals asyncronously."
(interactive)
(message "Begin parsing external manuals.")
- (elisa--async-do-parse 'elisa-parse-external-manuals))
+ (elisa--async-do 'elisa-parse-external-manuals))
;;;###autoload
(defun elisa-async-parse-all-manuals ()
"Parse all manuals asyncronously."
(interactive)
(message "Begin parsing manuals.")
- (elisa--async-do-parse 'elisa-parse-all-manuals))
+ (elisa--async-do 'elisa-parse-all-manuals))
+
+;;;###autoload
+(defun elisa-reparse-current-collection ()
+ "Incrementally reparse current directory collection.
+It does nothing if buffer file not inside one of existing collections."
+ (interactive)
+ (when-let* ((collections (flatten-tree
+ (sqlite-select
+ elisa-db
+ "select name from collections;")))
+ (dirs (cl-remove-if-not #'file-directory-p collections))
+ (file (buffer-file-name))
+ (collection (cl-find-if (lambda (dir)
+ (file-in-directory-p file dir))
+ dirs)))
+ (elisa-async-parse-directory collection)))
+
+;;;###autoload
+(defun elisa-disable-collection (&optional collection)
+ "Disable COLLECTION."
+ (interactive)
+ (let ((col (or collection
+ (completing-read
+ "Disable collection: "
+ elisa-enabled-collections))))
+ (setq elisa-enabled-collections
+ (cl-remove col elisa-enabled-collections :test #'string=))))
+
+;;;###autoload
+(defun elisa-disble-all-collections ()
+ "Disable all collections."
+ (interactive)
+ (mapc #'elisa-disable-collection elisa-enabled-collections))
+
+;;;###autoload
+(defun elisa-enable-collection (&optional collection)
+ "Enable COLLECTION."
+ (interactive)
+ (let ((col (or collection
+ (completing-read
+ "Enable collection: "
+ (cl-remove-if
+ (lambda (c)
+ (cl-find c elisa-enabled-collections :test #'string=))
+ (flatten-tree
+ (sqlite-select
+ elisa-db
+ "select name from collections;")))))))
+ (push col elisa-enabled-collections)))
+
+;;;###autoload
+(defun elisa-create-empty-collection (&optional collection)
+ "Create new empty COLLECTION."
+ (interactive "sNew collection name: ")
+ (save-window-excursion
+ (sqlite-execute
+ elisa-db
+ (format
+ "insert into collections (name) values ('%s') on conflict do nothing;"
+ (elisa-sqlite-escape collection)))))
+
+;;;###autoload
+(defun elisa-add-file-to-collection (file collection)
+ "Add FILE to COLLECTION."
+ (interactive
+ (list
+ (read-file-name "File: ")
+ (completing-read
+ "Enable collection: "
+ (flatten-tree
+ (sqlite-select
+ elisa-db
+ "select name from collections;")))))
+ (let ((collection-id (caar (sqlite-select
+ elisa-db
+ (format
+ "select rowid from collections where name =
'%s';"
+ (elisa-sqlite-escape collection))))))
+ (elisa--async-do (lambda () (elisa-parse-file collection-id file)))))
+
+;;;###autoload
+(defun elisa-add-webpage-to-collection (url collection)
+ "Add webpage by URL to COLLECTION."
+ (interactive
+ (list
+ (if-let ((url (or (and (fboundp 'thing-at-point) (thing-at-point 'url))
+ (shr-url-at-point nil))))
+ url
+ (read-string "Enter URL you want to summarize: "))
+ (completing-read
+ "Enable collection: "
+ (flatten-tree
+ (sqlite-select
+ elisa-db
+ "select name from collections;")))))
+ (let ((collection-id (caar (sqlite-select
+ elisa-db
+ (format
+ "select rowid from collections where name =
'%s';"
+ (elisa-sqlite-escape collection))))))
+ (elisa--async-do (lambda () (elisa--parse-web-page collection-id url)))))
+
+;;;###autoload
+(defun elisa-remove-collection (&optional collection)
+ "Remove COLLECTION."
+ (interactive)
+ (let* ((col (or collection
+ (completing-read
+ "Enable collection: "
+ (flatten-tree
+ (sqlite-select
+ elisa-db
+ "select name from collections;")))))
+ (collection-id (caar (sqlite-select
+ elisa-db
+ (format
+ "select rowid from collections where name =
'%s';"
+ (elisa-sqlite-escape col)))))
+ (delete-ids (flatten-tree
+ (sqlite-select
+ elisa-db
+ (format
+ "select rowid from data where collection_id = %d;"
+ collection-id)))))
+ (elisa-disable-collection col)
+ (when (file-directory-p col)
+ (let ((files
+ (flatten-tree
+ (sqlite-select
+ elisa-db
+ (format
+ "select distinct path from data where collection_id = %d;"
+ collection-id)))))
+ (sqlite-execute
+ elisa-db
+ (format
+ "delete from files where path in %s;"
+ (elisa-sqlite-format-string-list files)))))
+ (elisa--delete-data delete-ids)
+ (sqlite-execute
+ elisa-db
+ (format
+ "delete from collections where rowid = %d;"
+ collection-id))))
+
+(defun elisa--gen-chat (&optional collections)
+ "Generate function for chat with elisa based on COLLECTIONS."
+ (let ((cols (or collections elisa-enabled-collections)))
+ (lambda (prompt)
+ (elisa-find-similar
+ prompt cols
+ (lambda (query) (elisa-retrieve-ask query prompt))))))
;;;###autoload
-(defun elisa-chat (prompt)
- "Send PROMPT to elisa."
+(defun elisa-chat (prompt &optional collections)
+ "Send PROMPT to elisa.
+Find similar quotes in COLLECTIONS and add it to context."
(interactive "sAsk elisa: ")
- (let ((infos (elisa-find-similar prompt)))
- (mapc #'ellama-context-add-info-node infos)
- (ellama-chat prompt nil :provider elisa-chat-provider)))
+ (let ((cols (or collections elisa-enabled-collections)))
+ (elisa--rewrite-prompt prompt (elisa--gen-chat cols))))
(provide 'elisa)
;;; elisa.el ends here.
- [elpa] externals/elisa ad130b564f 60/98: Add parse file function, (continued)
- [elpa] externals/elisa ad130b564f 60/98: Add parse file function, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa b419fb2cf2 61/98: Add code for parsing directory as an elisa collection, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa ef06534f46 62/98: Implement incremental parsing, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 0e32d7bb5c 63/98: Add async directory parsing, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa f744ce305a 67/98: Add reparse current collection command, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 9ad7827337 70/98: Fix semantic split with single chunk, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 439ed1d4f8 76/98: Make executable customization simpler, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa e5691f59c5 80/98: Make syncronous functions non-interactive, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa e92628390b 82/98: Update example configuration, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 39915439a4 84/98: Update installation instructions, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa fbfe3b4ae1 86/98: Merge pull request #12 from s-kostyaev/semantic-split,
ELPA Syncer <=
- [elpa] externals/elisa 3882b9b322 87/98: Bump version, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 04d580f072 92/98: add vector- and vss-path to injected variables on async, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 1acc89545d 31/98: Merge pull request #13 from s-kostyaev/fix-builtin-manuals-parsing, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 3eff22d4b6 53/98: Use new railways for info manuals, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 503083c0fb 58/98: Truncate long lines in done message, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 25e0df1dca 65/98: Create customization group, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 14af9ae960 66/98: Improve collection management, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 7460059992 85/98: Update CI, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa cdf3ece6b4 93/98: Merge pull request #19 from dabi/patch-1, ELPA Syncer, 2024/07/17
- [elpa] externals/elisa 3372452de2 94/98: Bump version, ELPA Syncer, 2024/07/17