Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code ge

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code ge

From:	Alex Bennée
Subject:	Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
Date:	Thu, 23 Nov 2023 15:32:44 +0000
User-agent:	mu4e 1.11.25; emacs 29.1

Manos Pitsidianakis <manos.pitsidianakis@linaro.org> writes:

> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
>>> There has been an explosion of interest in so called "AI" (LLM)
>>> code generators in the past year or so. Thus far though, this is
>>> has not been matched by a broadly accepted legal interpretation
>>> of the licensing implications for code generator outputs. While
>>> the vendors may claim there is no problem and a free choice of
>>> license is possible, they have an inherent conflict of interest
>>> in promoting this interpretation. More broadly there is, as yet,
>>> no broad consensus on the licensing implications of code generators
>>> trained on inputs under a wide variety of licenses.
>>> The DCO requires contributors to assert they have the right to
>>> contribute under the designated project license. Given the lack
>>> of consensus on the licensing of "AI" (LLM) code generator output,
>>> it is not considered credible to assert compliance with the DCO
>>> clause (b) or (c) where a patch includes such generated code.
>>> This patch thus defines a policy that the QEMU project will not
>>> accept contributions where use of "AI" (LLM) code generators is
>>> either known, or suspected.
>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>> ---
>>>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>>>  1 file changed, 40 insertions(+)
>>> diff --git a/docs/devel/code-provenance.rst
>>> b/docs/devel/code-provenance.rst
>>> index b4591a2dec..a6e42c6b1b 100644
>>> --- a/docs/devel/code-provenance.rst
>>> +++ b/docs/devel/code-provenance.rst
>>> @@ -195,3 +195,43 @@ example::
>>>    Signed-off-by: Some Person <some.person@example.com>
>>>    [Rebased and added support for 'foo']
>>>    Signed-off-by: New Person <new.person@example.com>
>>> +
>>> +Use of "AI" (LLM) code generators
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +TL;DR:
>>> +
>>> +  **Current QEMU project policy is to DECLINE any contributions
>>> +  which are believed to include or derive from "AI" (LLM)
>>> +  generated code.**
>>> +
>>> +The existence of "AI" (`Large Language Model 
>>> <https://en.wikipedia.org/wiki/Large_language_model>`__
>>> +/ LLM) code generators raises a number of difficult legal questions, a
>>> +number of which impact on Open Source projects. As noted earlier, the
>>> +QEMU community requires that contributors certify their patch submissions
>>> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
>>> +patch contains "AI" generated code this raises difficulties with code
>>> +provenence and thus DCO compliance.
>>> +
<snip>
>>> +
>>> +The QEMU maintainers thus require that contributors refrain from using
>>> +"AI" code generators on patches intended to be submitted to the project,
>>> +and will decline any contribution if use of "AI" is known or suspected.
>>> +
>>> +Examples of tools impacted by this policy includes both GitHub CoPilot,
>>> +and ChatGPT, amongst many others which are less well known.
>>
>>
>>So you called out these two by name, fine, but given "AI" is in scare
>>quotes I don't really know what is or is not allowed and I don't know
>>how will contributors know.  Is the "AI" that one must not use
>>necessarily an LLM?  And how do you define LLM even? Wikipedia says
>>"general-purpose language understanding and generation".
>>
>>
>>All this seems vague to me.
>>
>>
>>However, can't we define a simpler more specific policy?
>>For example, isn't it true that *any* automatically generated code
>>can only be included if the scripts producing said code
>>are also included or otherwise available under GPLv2?
>
> The following definition makes sense to me:
>
> - Automated codegen tool must be idempotent.
> - Automated codegen tool must not use statistical modelling.
>
> I'd remove all AI or LLM references. These are non-specific,
> colloquial and in the case of `AI`, non-technical. This policy should
> apply the same to a Markov chain code generator.

I'm fairly sure my Emacs auto-complete would fail by that definition.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators, (continued)

Prev by Date: Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
Next by Date: [PULL 00/14] random fixes for 8.2 (tests, plugins, docs, semihosting)
Previous by thread: Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
Next by thread: Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
Index(es):
- Date
- Thread