From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id +JE+GUZL92JsygAAbAwnHQ (envelope-from ) for ; Sat, 13 Aug 2022 08:57:10 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id ODUpGUZL92L2MwEAauVa8A (envelope-from ) for ; Sat, 13 Aug 2022 08:57:10 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id EC4B12756A for ; Sat, 13 Aug 2022 08:57:09 +0200 (CEST) Received: from localhost ([::1]:44974 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oMl52-0006I7-FZ for larch@yhetil.org; Sat, 13 Aug 2022 02:57:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45610) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oMl41-0004le-AL for guix-patches@gnu.org; Sat, 13 Aug 2022 02:56:05 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:41555) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oMl41-0006XS-1t for guix-patches@gnu.org; Sat, 13 Aug 2022 02:56:05 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oMl40-0006sG-TV for guix-patches@gnu.org; Sat, 13 Aug 2022 02:56:04 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#57168] [PATCH 13/14] build: marionette: Add support for Tesseract OCR. Resent-From: Maxim Cournoyer Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sat, 13 Aug 2022 06:56:04 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 57168 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: 57168@debbugs.gnu.org Cc: Maxim Cournoyer Received: via spool by 57168-submit@debbugs.gnu.org id=B57168.166037371726314 (code B ref 57168); Sat, 13 Aug 2022 06:56:04 +0000 Received: (at 57168) by debbugs.gnu.org; 13 Aug 2022 06:55:17 +0000 Received: from localhost ([127.0.0.1]:59527 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMl3E-0006qK-CB for submit@debbugs.gnu.org; Sat, 13 Aug 2022 02:55:16 -0400 Received: from mail-qt1-f169.google.com ([209.85.160.169]:37425) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oMl2u-0006mm-GX for 57168@debbugs.gnu.org; Sat, 13 Aug 2022 02:54:57 -0400 Received: by mail-qt1-f169.google.com with SMTP id l5so2290489qtv.4 for <57168@debbugs.gnu.org>; Fri, 12 Aug 2022 23:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=uH4/lopng/rk2T53vxacOtpaTcRgrf2BmyAmhXDjo2M=; b=URX7JfQP1CFe/yROBtsH8s2Nf1QuRpa+qvJQLiokFwjlUiq+0hgrpmxvjzqPrPpIaG wpXL9UIsa9290nX2kO9rNs5JMWzpfoaMUH+as8sWhDiBXxLH7nUqIvQHmfLUEvbzGiu+ jmR1dTscKEqQLbjcnCJ8Nvvlwifq1Y/le2tLSnzkOZJdQwih4Ma5jI4X8Gzq2UnZ+YCL TbBSXy8tkxQ5/W0NRqyWK5miMErbrvntqa7R3vdI+BlkXT6+8AZDzs51BNSxIO3QvnPF dgIvp5crlIduq03ia7rv8rhyCUqIqyWLDHToCy8JiuoTQOnu+gSyPC29V5xMdEvNG9gL 31og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=uH4/lopng/rk2T53vxacOtpaTcRgrf2BmyAmhXDjo2M=; b=gNqOndeqEvPJVLh/Wn4MetFBLjSU3sZLvYbrf6CMBjR2o5MF2Cqf4HTWcd8h2tUVcq KDRSsJzL7S367mPNIJ/eGR/Fb95pC+9NQEcq+UsFzAZaZCmofr+gmyqvk1KKnlKBsnlG 7OFgCoie6tzjrnZggK9d8vvPktqJfUlhkOBGFI1mJfTNrhVvLrTKx6jWejQeNKieTppc cIm2GQ9+Jd/iGLttA5kHih6w0yWW3GOPlKQp+gqWTYSkAFEAlKG2yK9XEHft7z3GEfee brtqmNUr098YVsPb7rct8qeNecIAIT+JV+U5ecHlgzSZ0dmLUZRg/BmMbkhTO/ztTGrc MJNA== X-Gm-Message-State: ACgBeo0risKJd4ory1bDEQWdWFbfOnAdqfcfkOynz9Cdi9OLMNaxk2px JZQlf1r4UJ7yTXOaFjNehQYv+5QQEqg= X-Google-Smtp-Source: AA6agR56h42kj3feEd0Q3sUKPI22jbUV/QVUVzHNyCgBG4UcE8GJStUaJgnAIL4CwVfEY6ipNR3vmQ== X-Received: by 2002:a05:622a:1a05:b0:343:87eb:c686 with SMTP id f5-20020a05622a1a0500b0034387ebc686mr2573306qtb.643.1660373690788; Fri, 12 Aug 2022 23:54:50 -0700 (PDT) Received: from localhost.localdomain (dsl-205-233-125-72.b2b2c.ca. [205.233.125.72]) by smtp.gmail.com with ESMTPSA id s12-20020a05620a29cc00b006b6757a11fcsm3734441qkp.36.2022.08.12.23.54.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Aug 2022 23:54:50 -0700 (PDT) From: Maxim Cournoyer Date: Sat, 13 Aug 2022 02:54:32 -0400 Message-Id: <20220813065433.27319-13-maxim.cournoyer@gmail.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20220813065433.27319-1-maxim.cournoyer@gmail.com> References: <20220813065433.27319-1-maxim.cournoyer@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1660373830; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=uH4/lopng/rk2T53vxacOtpaTcRgrf2BmyAmhXDjo2M=; b=Dg4/isH3rrsdM13QDVF50IHUvQcerRP4cE7BdCUV6oj50rrTb8RtAtDwDDpaFcPeFW2wme xoeGvBPYNmw72KTKDAomFvkJYpNh4tHi/UkQbBNOybMhgcFjYRHTZx0haSDgD9coCmc4Z4 LkUdk49WdzM6fsLvJ0RqzKXZxTjrmCB0nA3MRu70rjCqgPtPofnYaLZtwEB68ILnESr8Kk 3uijYGBiS7XHfODXoDOwl+qPBjHSwg3gUZTr5gk7yep5lGFlWEO0Q0ciewh7EjG7pSbvVc YV6462Xj9E/26DddPDq7RMIf7N2OkjVoCYLhwVW8QNW37CcbxagaakCbecA0Gw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1660373830; a=rsa-sha256; cv=none; b=XZ3DPC/vjCFBRoH/Yzis/VeQJY3K5/BNchJfDU5cS5Bl1U4lzsKZupfRVzfJGPJZTg8fXi KmZXbTh8ghYDlBqgf27thoWZ7yr/Etu1022eAq5UaHAKFd1qbN+JzVkmPBtHB/YJQqBAZM oen8QCcBEpBJvPHYN+KVBklpl9Cnl/WAyZ7Z+YqXVeelF9iSZw7OMdbm44bS1ElB5FOhDO slzUACZ55DeqiIYoiJfDCjtkufaOw6cFkRWUu1gpxTwGVrZG/wyfBTIKk1c5WWEJntAaxg 2bNQde1n32hQh1k9/JETlbMn9Oi76hv/IgbIAa3fEDLqa4w8OsfhnS0chCpt7A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=URX7JfQP; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 7.84 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=URX7JfQP; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: EC4B12756A X-Spam-Score: 7.84 X-Migadu-Scanner: scn0.migadu.com X-TUID: XjgtP0yMwpNW * gnu/build/marionette.scm (invoke-ocrad-ocr): New procedure. (invoke-tesseract-ocr): Likewise. (marionette-screen-text): Rename the #:ocrad argument to #:ocr. Dispatch the matching OCR invocation procedure. (wait-for-screen-text): Rename the #:ocrad argument to #:ocr. * gnu/tests/base.scm (run-basic-test): Adjust accordingly. * gnu/tests/install.scm (enter-luks-passphrase): Likewise. (enter-luks-passphrase-for-home): Likewise. --- gnu/build/marionette.scm | 67 +++++++++++++++++++++++----------------- gnu/tests/base.scm | 4 +-- gnu/tests/install.scm | 8 ++--- 3 files changed, 45 insertions(+), 34 deletions(-) diff --git a/gnu/build/marionette.scm b/gnu/build/marionette.scm index 24170bbd30..06b699bd7b 100644 --- a/gnu/build/marionette.scm +++ b/gnu/build/marionette.scm @@ -268,39 +268,50 @@ (define (marionette-control command marionette) ;; The "quit" command terminates QEMU immediately, with no output. (unless (string=? command "quit") (wait-for-monitor-prompt monitor))))) -(define* (marionette-screen-text marionette - #:key - (ocrad "ocrad")) - "Take a screenshot of MARIONETTE, perform optical character -recognition (OCR), and return the text read from the screen as a string. Do -this by invoking OCRAD (file name for GNU Ocrad's command)" - (define (random-file-name) - (string-append "/tmp/marionette-screenshot-" - (number->string (random (expt 2 32)) 16) - ".ppm")) - - (let ((image (random-file-name))) +(define* (invoke-ocrad-ocr image #:key (ocrad "ocrad")) + "Invoke the OCRAD command on image, and return the recognized text." + (let* ((pipe (open-pipe* OPEN_READ ocrad "-i" "-s" "10" image)) + (text (get-string-all pipe))) + (unless (zero? (close-pipe pipe)) + (error "'ocrad' failed" ocrad)) + text)) + +(define* (invoke-tesseract-ocr image #:key (tesseract "tesseract")) + "Invoke the TESSERACT command on IMAGE, and return the recognized text." + (let* ((output-basename (tmpnam)) + (output-basename* (string-append output-basename ".txt"))) (dynamic-wind (const #t) (lambda () - (marionette-control (string-append "screendump " image) - marionette) - - ;; Tell Ocrad to invert the image colors (make it black on white) and - ;; to scale the image up, which significantly improves the quality of - ;; the result. In spite of this, be aware that OCR confuses "y" and - ;; "V" and sometimes erroneously introduces white space. - (let* ((pipe (open-pipe* OPEN_READ ocrad - "-i" "-s" "10" image)) - (text (get-string-all pipe))) - (unless (zero? (close-pipe pipe)) - (error "'ocrad' failed" ocrad)) - text)) + (let ((exit-val (status:exit-val + (system* tesseract image output-basename)))) + (unless (zero? exit-val) + (error "'tesseract' failed" tesseract)) + (call-with-input-file output-basename* get-string-all))) (lambda () - (false-if-exception (delete-file image)))))) + (false-if-exception (delete-file output-basename)) + (false-if-exception (delete-file output-basename*)))))) + +(define* (marionette-screen-text marionette #:key (ocr "ocrad")) + "Take a screenshot of MARIONETTE, perform optical character +recognition (OCR), and return the text read from the screen as a string. Do +this by invoking OCR, which should be the file name of GNU Ocrad's +@command{ocrad} or Tesseract OCR's @command{tesseract} command." + (define image (string-append (tmpnam) ".ppm")) + ;; Use the QEMU Monitor to save an image of the screen to the host. + (marionette-control (string-append "screendump " image) marionette) + ;; Process it via the OCR. + (cond + ((string-contains ocr "ocrad") + (invoke-ocrad-ocr image #:ocrad ocr)) + ((string-contains ocr "tesseract") + (invoke-tesseract-ocr image #:tesseract ocr)) + (else (error "unsupported ocr command")))) (define* (wait-for-screen-text marionette predicate - #:key (timeout 30) (ocrad "ocrad")) + #:key + (ocr "ocrad") + (timeout 30)) "Wait for TIMEOUT seconds or until the screen text on MARIONETTE matches PREDICATE, whichever comes first. Raise an error when TIMEOUT is exceeded." (define start @@ -312,7 +323,7 @@ (define end (let loop ((last-text #f)) (if (> (car (gettimeofday)) end) (error "'wait-for-screen-text' timeout" 'ocr-text: last-text) - (let ((text (marionette-screen-text marionette #:ocrad ocrad))) + (let ((text (marionette-screen-text marionette #:ocr ocr))) (or (predicate text) (begin (sleep 1) diff --git a/gnu/tests/base.scm b/gnu/tests/base.scm index 353d6d415a..636b127fb8 100644 --- a/gnu/tests/base.scm +++ b/gnu/tests/base.scm @@ -341,7 +341,7 @@ (define (user-owned? file) (wait-for-screen-text marionette (lambda (text) (string-contains text "Password")) - #:ocrad + #:ocr #$(file-append ocrad "/bin/ocrad")) (marionette-type (string-append password "\n\n") marionette)) @@ -510,7 +510,7 @@ (define (entry->list entry) (test-assert "screen text" (let ((text (marionette-screen-text marionette - #:ocrad + #:ocr #$(file-append ocrad "/bin/ocrad")))) ;; Check whether the welcome message and shell prompt are diff --git a/gnu/tests/install.scm b/gnu/tests/install.scm index fbb97d451c..4e0e274e66 100644 --- a/gnu/tests/install.scm +++ b/gnu/tests/install.scm @@ -784,7 +784,7 @@ (define (bios-boot-screen? text) ;; At this point we have no choice but to use OCR to determine ;; when the passphrase should be entered. (wait-for-screen-text #$marionette passphrase-prompt? - #:ocrad #$ocrad) + #:ocr #$ocrad) (marionette-type #$(string-append %luks-passphrase "\n") #$marionette) @@ -792,7 +792,7 @@ (define (bios-boot-screen? text) ;; we can then be sure we match the "Enter passphrase" prompt from ;; 'cryptsetup', in the initrd. (wait-for-screen-text #$marionette (negate bios-boot-screen?) - #:ocrad #$ocrad + #:ocr #$ocrad #:timeout 20))) (test-assert "enter LUKS passphrase for the initrd" @@ -800,7 +800,7 @@ (define (bios-boot-screen? text) ;; XXX: Here we use OCR as well but we could instead use QEMU ;; '-serial stdio' and run it in an input pipe, (wait-for-screen-text #$marionette passphrase-prompt? - #:ocrad #$ocrad + #:ocr #$ocrad #:timeout 60) (marionette-type #$(string-append %luks-passphrase "\n") #$marionette) @@ -999,7 +999,7 @@ (define (passphrase-prompt? text) ;; XXX: Here we use OCR as well but we could instead use QEMU ;; '-serial stdio' and run it in an input pipe, (wait-for-screen-text #$marionette passphrase-prompt? - #:ocrad #$ocrad + #:ocr #$ocrad #:timeout 120) (marionette-type #$(string-append %luks-passphrase "\n") #$marionette) -- 2.36.1