[elpa] externals/llm 749e5b6991 1/2: Implement token counting for vertex

emacs-elpa-diffs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[elpa] externals/llm 749e5b6991 1/2: Implement token counting for vertex

From:	ELPA Syncer
Subject:	[elpa] externals/llm 749e5b6991 1/2: Implement token counting for vertex by querying API
Date:	Mon, 30 Oct 2023 00:58:34 -0400 (EDT)

branch: externals/llm
commit 749e5b69917d640a0b3e6081459427e4d339508c
Author: Andrew Hyatt <ahyatt@gmail.com>
Commit: Andrew Hyatt <ahyatt@gmail.com>

    Implement token counting for vertex by querying API
---
 NEWS.org      |  2 ++
 README.org    |  2 +-
 llm-vertex.el | 29 +++++++++++++++++++++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/NEWS.org b/NEWS.org
index 6ff8d1322f..0e82d93163 100644
--- a/NEWS.org
+++ b/NEWS.org
@@ -1,3 +1,5 @@
+* Version 0.6
+- Implement token counting for Google Cloud Vertex via their API.
 * Version 0.5
 - Fixes for conversation context storage, requiring clients to handle ongoing 
conversations slightly differently.
 - Fixes for proper sync request http error code handling.
diff --git a/README.org b/README.org
index 674d0e0e13..b1b9536e76 100644
--- a/README.org
+++ b/README.org
@@ -70,7 +70,7 @@ For all callbacks, the callback will be executed in the 
buffer the function was
 - ~llm-chat-streaming provider prompt partial-callback response-callback 
error-callback~:  Similar to ~llm-chat-async~, but request a streaming 
response.  As the response is built up, ~partial-callback~ is called with the 
all the text retrieved up to the current point.  Finally, ~reponse-callback~ is 
called with the complete text.
 - ~llm-embedding provider string~: With the user-chosen ~provider~, send a 
string and get an embedding, which is a large vector of floating point values.  
The embedding represents the semantic meaning of the string, and the vector can 
be compared against other vectors, where smaller distances between the vectors 
represent greater semantic similarity.
 - ~llm-embedding-async provider string vector-callback error-callback~: Same 
as ~llm-embedding~ but this is processed asynchronously. ~vector-callback~ is 
called with the vector embedding, and, in case of error, ~error-callback~ is 
called with the same arguments as in ~llm-chat-async~.
-- ~llm-count-tokens provider string~: Count how many tokens are in ~string~.  
This may theoretically vary by ~provider~ but typically is always about the 
same.  This gives an estimate only.
+- ~llm-count-tokens provider string~: Count how many tokens are in ~string~.  
This may vary by ~provider~, because some provideres implement an API for this, 
but typically is always about the same.  This gives an estimate if the provider 
has no API support.
 
   And the following helper functions:
   - ~llm-make-simple-chat-prompt text~: For the common case of just wanting a 
simple text prompt without the richness that ~llm-chat-prompt~ struct provides, 
use this to turn a string into a ~llm-chat-prompt~ that can be passed to the 
main functions above.
diff --git a/llm-vertex.el b/llm-vertex.el
index 4ad59ff33e..557507f3e3 100644
--- a/llm-vertex.el
+++ b/llm-vertex.el
@@ -313,6 +313,35 @@ If STREAMING is non-nil, use the URL for the streaming 
API."
                                  (llm-request-callback-in-buffer buf 
error-callback 'error
                                                                  
(llm-vertex--error-message data))))))
 
+(defun llm-vertex--count-token-url (provider)
+  "Return the URL to use for the Vertex API.
+PROVIDER is the llm provider.
+MODEL "
+  (format 
"https://%s-aiplatform.googleapis.com/v1beta1/projects/%s/locations/%s/publishers/google/models/%s:countTokens";
+          llm-vertex-gcloud-region
+          (llm-vertex-project provider)
+          llm-vertex-gcloud-region
+          (or (llm-vertex-embedding-model provider) "chat-bison")))
+
+;; Token counts
+;; https://cloud.google.com/vertex-ai/docs/generative-ai/get-token-count
+
+(defun llm-vertex--count-token-request (string)
+  "Create the data payload to count tokens in STRING."
+  `((instances . [((prompt . ,string))])))
+
+(defun llm-vertex--count-tokens-extract-response (response)
+  "Extract the token count from the response."
+  (assoc-default 'totalTokens response))
+
+(cl-defmethod llm-count-tokens ((provider llm-vertex) string)
+  (llm-vertex-refresh-key provider)
+  (llm-vertex--handle-response
+   (llm-request-sync (llm-vertex--count-token-url provider)
+                     :headers `(("Authorization" . ,(format "Bearer %s" 
(llm-vertex-key provider))))
+                     :data (llm-vertex--count-token-request string))
+   #'llm-vertex--count-tokens-extract-response))
+
 (provide 'llm-vertex)
 
 ;;; llm-vertex.el ends here

[Prev in Thread]

Current Thread

[Next in Thread]

[elpa] externals/llm updated (8f431fad9e -> 39e462d512), ELPA Syncer, 2023/10/30
- [elpa] externals/llm 749e5b6991 1/2: Implement token counting for vertex by querying API, ELPA Syncer <=
- [elpa] externals/llm 39e462d512 2/2: Fixes to llm-vertex token counting, ELPA Syncer, 2023/10/30

Prev by Date: [elpa] externals/llm updated (8f431fad9e -> 39e462d512)
Next by Date: [elpa] externals/llm 39e462d512 2/2: Fixes to llm-vertex token counting
Previous by thread: [elpa] externals/llm updated (8f431fad9e -> 39e462d512)
Next by thread: [elpa] externals/llm 39e462d512 2/2: Fixes to llm-vertex token counting
Index(es):
- Date
- Thread