unofficial mirror of bug-mumi@gnu.org
 help / color / mirror / Atom feed
* bug#69381: mumi does not correctly display (some?) non-ascii characters
@ 2024-02-25 13:04 Tomas Volf
  2024-05-14 23:12 ` bug#69381: [PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381) Felix Lechner via Bug-mumi via Bug reports for GNU Guix Mumi.
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Tomas Volf @ 2024-02-25 13:04 UTC (permalink / raw)
  To: 69381

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

Hi,

when I compare mumi page[0] with debbugs page[1], the from field displays "???"
in mumi, but "宋文武" in debbugs.

Have a nice day,
Tomas Volf

0: https://issues.guix.gnu.org/57268
1: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=57268

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#69381: [PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381)
  2024-02-25 13:04 bug#69381: mumi does not correctly display (some?) non-ascii characters Tomas Volf
@ 2024-05-14 23:12 ` Felix Lechner via Bug-mumi via Bug reports for GNU Guix Mumi.
  2024-11-02  0:07 ` bug#69381: [PATCH] web: Use string to avoid losing unicode characters noe--- via Bug-mumi via Bug reports for GNU Guix Mumi.
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Felix Lechner via Bug-mumi via Bug reports for GNU Guix Mumi. @ 2024-05-14 23:12 UTC (permalink / raw)
  To: 69381; +Cc: Tomas Volf, Felix Lechner

This fixes a host of encoding issues in Mumi, including the diff
problems that are not mentioned in the bug.  An example is here:

    https://issues.guix.gnu.org/63508#4

The procedure version may one day be more efficient but does not work.
Based on comments in the Guile source code, the procedure style may
one day enable more advanced response formats.  The author is unclear
as to why the procedure does not work.  There may be a complex
interaction involving the response headers.

A preview of this code is live at patchwise.org.

The solution of this bug may depend on the patch in Bug#70907.  This
patch furthermore depends on the patch in Bug#70906, but the solution
of the bug may not.
---
 mumi/web/render.scm | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mumi/web/render.scm b/mumi/web/render.scm
index 316ca4c..9b16f8d 100644
--- a/mumi/web/render.scm
+++ b/mumi/web/render.scm
@@ -28,6 +28,7 @@
   #:use-module ((ice-9 textual-ports)
                 #:select (get-string-all put-string))
   #:use-module (ice-9 match)
+  #:use-module (rnrs bytevectors)
   #:use-module (web http)
   #:use-module (web request)
   #:use-module (web response)
@@ -104,13 +105,13 @@
 (define* (render-html sxml #:key (extra-headers '()))
   (values (append extra-headers
                   '((content-type . (text/html (charset . "utf-8")))))
-          (lambda (port)
-            (sxml->html sxml port))))
+          (string->utf8
+           (sxml->html-string sxml))))
 
 (define (render-json json)
   (values '((content-type . (application/json (charset . "utf-8"))))
-          (lambda (port)
-            (scm->json json port))))
+          (string->utf8
+           (scm->json-string json))))
 
 (define (not-found uri)
   (values (build-response #:code 404)
-- 
2.41.0





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#69381: [PATCH] web: Use string to avoid losing unicode characters.
  2024-02-25 13:04 bug#69381: mumi does not correctly display (some?) non-ascii characters Tomas Volf
  2024-05-14 23:12 ` bug#69381: [PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381) Felix Lechner via Bug-mumi via Bug reports for GNU Guix Mumi.
@ 2024-11-02  0:07 ` noe--- via Bug-mumi via Bug reports for GNU Guix Mumi.
  2024-11-02  0:14 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
  2024-11-02  2:23 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
  3 siblings, 0 replies; 5+ messages in thread
From: noe--- via Bug-mumi via Bug reports for GNU Guix Mumi. @ 2024-11-02  0:07 UTC (permalink / raw)
  To: 69381; +Cc: Noé Lopez, Tomas Volf, Felix Lechner

From: Noé Lopez <noelopez@free.fr>

I don’t really understand why the unicode characters were lost in the
first place, maybe something in the sanitize-response of (fibers web
server)?  Specifically, strings and procedures don’t take the same
path there.

* mumi/web/render.scm (render-html): Return string instead of procedure.
---
 mumi/web/render.scm | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mumi/web/render.scm b/mumi/web/render.scm
index 168f3bc..c28a26f 100644
--- a/mumi/web/render.scm
+++ b/mumi/web/render.scm
@@ -105,8 +105,9 @@
 (define* (render-html sxml #:key (extra-headers '()))
   (values (append extra-headers
                   '((content-type . (text/html (charset . "utf-8")))))
-          (lambda (port)
-            (sxml->html sxml port))))
+          (call-with-output-string
+	    (lambda (port)
+              (sxml->html sxml port)))))
 
 (define (render-json json)
   (values '((content-type . (application/json)))
-- 
2.46.0





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#69381: [PATCH] web: Use string to avoid losing unicode characters.
  2024-02-25 13:04 bug#69381: mumi does not correctly display (some?) non-ascii characters Tomas Volf
  2024-05-14 23:12 ` bug#69381: [PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381) Felix Lechner via Bug-mumi via Bug reports for GNU Guix Mumi.
  2024-11-02  0:07 ` bug#69381: [PATCH] web: Use string to avoid losing unicode characters noe--- via Bug-mumi via Bug reports for GNU Guix Mumi.
@ 2024-11-02  0:14 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
  2024-11-02  2:23 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
  3 siblings, 0 replies; 5+ messages in thread
From: Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi. @ 2024-11-02  0:14 UTC (permalink / raw)
  To: 69381

Hi,

Wanted to send this patch separately but had this issue selected in mumi
so it sent it here, oops.

I recognize this solution is not optimal (a hack even), but it should be
heavily considered as the issue is rampant among international users.

I suspect the actual issue lies in fibers, as said in the commit message
and I’ll try to fix it there but this patch is still important in the
meanwhile.

Good night,
Noé




^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#69381: [PATCH] web: Use string to avoid losing unicode characters.
  2024-02-25 13:04 bug#69381: mumi does not correctly display (some?) non-ascii characters Tomas Volf
                   ` (2 preceding siblings ...)
  2024-11-02  0:14 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
@ 2024-11-02  2:23 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
  3 siblings, 0 replies; 5+ messages in thread
From: Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi. @ 2024-11-02  2:23 UTC (permalink / raw)
  To: 69381

Small update,

I’ve investigated the issue in fibers and I now blame the guile web
library for the issue.  Apparently it sets the port to ISO-8859-1
encoding each time you call read-request, but it acts like « yeah don’t
worry just use utf-8 for your body » in the docs.

That’s fine UNLESS you use chunked transfers (omitting content-length in
fibers), in which case it just decides to blow up :///// (it assumes one
character = one byte)

In the end I’m pretty sure any of this could have been avoided by just
not replacing every character with question marks.  Had it kept the
invalid bytes intact they would have translated back with no issue.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-02  2:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-25 13:04 bug#69381: mumi does not correctly display (some?) non-ascii characters Tomas Volf
2024-05-14 23:12 ` bug#69381: [PATCH] Convert HTML to UTF-8 ourselves. (Closes: #69381) Felix Lechner via Bug-mumi via Bug reports for GNU Guix Mumi.
2024-11-02  0:07 ` bug#69381: [PATCH] web: Use string to avoid losing unicode characters noe--- via Bug-mumi via Bug reports for GNU Guix Mumi.
2024-11-02  0:14 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.
2024-11-02  2:23 ` Noé Lopez via Bug-mumi via Bug reports for GNU Guix Mumi.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).