emacs-tangents
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Collaborative training of Libre LLMs (was: Is ChatGTP SaaSS? (was: [


From: Debanjum Singh Solanky
Subject: Re: Collaborative training of Libre LLMs (was: Is ChatGTP SaaSS? (was: [NonGNU ELPA] New package: llm))
Date: Sat, 9 Sep 2023 19:18:44 -0700

 
  > However, if the "patching" technology can only serve a single "patch" +
  > main model, there is a problem. Improving libre neural networks will
  > become difficult, unless people utilize collaborative server to
  > continuously improve a model.

  > Such collaborative server, similar to ChatGPT, will combine "editing"
  > (training) and "consulting" together. And, unlike Wikipedia, these
  > activities are hard to separate.

If the users in this "community" can't move their work outside of a
private "collaborative server", they are in effect prisoners of that
server.  Whoever keeps them stuck there will have power, and that will
tempt per to mistreat them with it.

Versus traditional software, AI systems rely critically on the usage
data generated to improve the original model. Using copyleft licensed
models maybe enough to prevent a server owner from being able
to train a better closed model? This would prevent them from holding
users hostage on their server.

 
  > This raises a moral question about practical ways to improve libre
  > neural networks without falling into SaaSS practices.

>From the example above, I conclude it is crucial that people who use a
particular platform to modify and run the model have the feasible
freedom of copying their modified versions off that platform and onto
any other platform that satisfies the specs needed to run these models.

Platform portability does not solve for how to improve libre
neural networks in an open, community guided way.

To collaboratively develop better open models we'd need the generated
usage data to be publically shareable. Attempts like open-assistant
(https://open-assistant.io) that share usage data under cc-by-sa maybe
a good enough solution for this. But it'll fall on the server owners
to get explicit user consent and clean sensitive usage data to share
this data publically without liability.

--
Debanjum Singh Solanky
Founder, Khoj (https://khoj.dev/)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]