guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ImageMagick from 2020?


From: Timothy Sample
Subject: Re: ImageMagick from 2020?
Date: Sat, 22 Jan 2022 11:48:17 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hey,

zimoun <zimon.toutoune@gmail.com> writes:

> On Wed, 19 Jan 2022 at 11:36, Ludovic Courtès <ludo@gnu.org> wrote:
>
>> Oh right, so we’ll need to feed them historical ‘sources.json’ files
>> eventually, I think Timothy was planning to do that eventually.
>
> From my side, what I would like to achieve soon:
>
> [...]
>
> Then what is also missing is:
>
>  3- have a collection of sources.json per Guix revision -- at least
> some determined by us;

I’ve attached a script that makes a “sources.json” per commit from the
PoG database.  It only lists regularly downloaded sources (no VCS
sources), since that’s all the SWH loader supports so far.

I also played around with it and came up with

    https://ngyro.com/pog-reports/2022-01-16/missing-sources.json

This is a “sources.json” that only lists the “missing” and “unknown”
sources from the PoG report.  It lists sources across all commits (since
1.0.0).  This might be the easiest thing for SWH to handle, since it
omits nearly 20k sources that they definitely already have.  Since they
don’t have the tarball hashes, they have no way to skip downloading and
processing tarballs that they already have by hash.  Hence, filtering it
with the extra data we have through the PoG projects should be something
that they welcome!

If they want, they could point a loader task at

   https://ngyro.com/pog-reports/latest/missing-sources.json

and I could publish updates when I publish new PoG reports.

There’s one other thing to think about.  Some of our sources are
arguably unsuitable for SWH.  For instance, our bootstrap binaries.  I
bet we have a bunch of other borderline things, too, like game assets.
Of course, if they are indiscriminately ingesting Github, I’m sure
they’ve loaded plenty of garbage.  Mostly, I think about these things
because I believe it’s important to maintain the Guix-SWH relationship.

Here’s the per-commit script.  You can run it like this:

    $ guile sources.scm pog.db output-directory

(use-modules (gcrypt base64)
             (guix base32)
             (guix build download)
             ((guix download) #:select (%mirrors))
             (ice-9 match)
             (json)
             (sqlite3)
             (srfi srfi-1)
             (srfi srfi-9 gnu)
             (srfi srfi-19)
             (web uri))

(define-immutable-record-type <commit>
  (make-commit push-time hash)
  commit?
  (push-time commit-push-time)
  (hash commit-hash))

(define lookup-commits-query "\
SELECT c.push_time,
    c.hash
FROM commits c
ORDER BY c.push_time DESC")

(define (lookup-commits db)
  (define (record->commit rec)
    (match-let ((#(push-time hash) rec))
      (make-commit (and push-time (make-time time-utc 0 push-time))
                   hash)))
  (define (kons rec acc)
    (cons (record->commit rec) acc))
  (let* ((stmt (sqlite-prepare db lookup-commits-query))
         (commits (sqlite-fold kons '() stmt)))
    (sqlite-finalize stmt)
    commits))

(define lookup-sources-query "\
SELECT f.hash,
    fr.reference
FROM commits c
    JOIN fod_commit_links fcl USING (commit_id)
    JOIN fods f USING (fod_id)
    JOIN fod_references fr USING (fod_id)
WHERE c.hash = ?
    AND f.algorithm = 'sha256'
    AND (fr.reference LIKE '\"%'
        OR fr.reference LIKE '(\"%')
    AND NOT fr.is_error")

(define (nix-base32-sha256->subresource-integrity digest)
  "Convert the Nix-style base32-encoded SHA-256 hash DIGEST into a
Subresource Integrity metadata value."
  (define bv (nix-base32-string->bytevector digest))
  (define b64 (base64-encode bv))
  (string-append "sha256-" b64))

(define (web-reference-urls reference)
  (define uris
    (match (call-with-input-string reference read)
      ((urls ...) (map string->uri urls))
      (url (list (string->uri url)))))
  (append-map (lambda (uri)
                (map uri->string
                     (maybe-expand-mirrors uri %mirrors)))
              uris))

(define (lookup-sources db commit)
  (define (record->url-source rec)
    (match-let ((#(digest reference) rec))
      (let ((urls (web-reference-urls reference))
            (integrity (nix-base32-sha256->subresource-integrity digest)))
        `(("type" . "url")
          ("urls" . ,(list->vector urls))
          ("integrity" . ,integrity)))))
  (define (kons rec acc)
    (cons (record->url-source rec) acc))
  (let* ((stmt (sqlite-prepare db lookup-sources-query))
         (_ (sqlite-bind-arguments stmt commit))
         (sources (sqlite-fold kons '() stmt)))
    (sqlite-finalize stmt)
    sources))

(define (commit-sources-name directory commit)
  (string-append directory
                 "/"
                 (date->string (time-utc->date (commit-push-time commit))
                               "~Y-~m-~d")
                 "-"
                 (string-take (commit-hash commit) 7)
                 "-sources.json"))

(match (program-arguments)
  ((_ db-file directory)
   (mkdir directory)
   (let* ((db (sqlite-open db-file))
          (commits (lookup-commits db)))
     (for-each (lambda (commit)
                 (call-with-output-file (commit-sources-name directory commit)
                   (lambda (out)
                     (let* ((hash (commit-hash commit))
                            (sources (lookup-sources db hash)))
                       (scm->json `(("version" . "1")
                                    ("revision" . ,hash)
                                    ("sources" . ,(list->vector sources)))
                                  out)
                       (newline out)))))
               (list (car commits)))
     (sqlite-close db)
     (exit EXIT_SUCCESS)))
  (_ (display "usage: sources.scm DB-FILE\n" (current-error-port))
     (exit EXIT_FAILURE)))

-- Tim

reply via email to

[Prev in Thread] Current Thread [Next in Thread]