* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string @ 2022-03-27 5:49 Yuan Fu 2022-03-27 6:40 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Yuan Fu @ 2022-03-27 5:49 UTC (permalink / raw) To: 54591 I’m trying make use of sqlite functionalities added recently and found that the query result of blob type entries are multibyte string rather than unibyte. That caused issues when I tried to create image out of the query result: the image appears to be broken, even though the data is good. Adding string-as-unibyte solves the issue, but I guess the return value of a blob entry is probably best to be unibyte by default? Yuan In GNU Emacs 29.0.50 (build 7, x86_64-apple-darwin21.3.0, NS appkit-2113.30 Version 12.2.1 (Build 21D62)) of 2022-03-12 built on Brown Repository revision: 22dde4e621fb49b9f05d560aee22332ad60bf485 Repository branch: master Windowing system distributor 'Apple', version 10.3.2113 System Description: macOS 12.3 Configured using: 'configure --with-pdumper=yes --with-ns' Configured features: ACL DBUS GLIB GNUTLS JSON LCMS2 LIBXML2 MODULES NOTIFY KQUEUE NS PDUMPER PNG RSVG SQLITE3 THREADS TOOLKIT_SCROLL_BARS WEBP XIM ZLIB Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Lisp Interaction Minor modes in effect: hexl-follow-ascii: t override-global-mode: t emms-mode-line-mode: t emms-playing-time-display-mode: t emms-playing-time-mode: t global-git-commit-mode: t shell-dirtrack-mode: t keyfreq-mode: t form-feed-mode: t diff-hl-mode: t after-save-mode: t undo-hl-mode: t consult-binded-mode: t pixel-scroll-precision-mode: t global-so-long-mode: t color-outline-mode: t selectrum-prescient-mode: t minibuffer-electric-default-mode: t which-function-mode: t yas-global-mode: t yas-minor-mode: t flymake-mode: t flyspell-mode: t ws-butler-mode: t savehist-mode: t windmove-mode: t global-hl-todo-mode: t hl-todo-mode: t global-highlight-parentheses-mode: t highlight-parentheses-mode: t rainbow-delimiters-mode: t electric-pair-mode: t winner-mode: t aggressive-indent-mode: t recentf-mode: t which-key-mode: t company-mode: t outline-minor-mode: t selectrum-mode: t luna-key-override-mode: t tooltip-mode: t global-eldoc-mode: t eldoc-mode: t electric-quote-mode: t electric-indent-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t hs-minor-mode: t Load-path shadows: /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-compat hides /Users/yuan/.emacs.d/ranch/matlab/matlab-compat /Users/yuan/.emacs.d/ranch/matlab-emacs/company-matlab-shell hides /Users/yuan/.emacs.d/ranch/matlab/company-matlab-shell /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-cgen hides /Users/yuan/.emacs.d/ranch/matlab/matlab-cgen /Users/yuan/.emacs.d/ranch/matlab-emacs/semanticdb-matlab hides /Users/yuan/.emacs.d/ranch/matlab/semanticdb-matlab /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-shell-gud hides /Users/yuan/.emacs.d/ranch/matlab/matlab-shell-gud /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-topic hides /Users/yuan/.emacs.d/ranch/matlab/matlab-topic /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-maint hides /Users/yuan/.emacs.d/ranch/matlab/matlab-maint /Users/yuan/.emacs.d/ranch/matlab-emacs/srecode-matlab hides /Users/yuan/.emacs.d/ranch/matlab/srecode-matlab /Users/yuan/.emacs.d/ranch/matlab-emacs/linemark hides /Users/yuan/.emacs.d/ranch/matlab/linemark /Users/yuan/.emacs.d/ranch/matlab-emacs/mlint hides /Users/yuan/.emacs.d/ranch/matlab/mlint /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-shell hides /Users/yuan/.emacs.d/ranch/matlab/matlab-shell /Users/yuan/.emacs.d/ranch/matlab-emacs/semantic-matlab hides /Users/yuan/.emacs.d/ranch/matlab/semantic-matlab /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-publish hides /Users/yuan/.emacs.d/ranch/matlab/matlab-publish /Users/yuan/.emacs.d/ranch/matlab-emacs/tlc hides /Users/yuan/.emacs.d/ranch/matlab/tlc /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab hides /Users/yuan/.emacs.d/ranch/matlab/matlab /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-complete hides /Users/yuan/.emacs.d/ranch/matlab/matlab-complete /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-netshell hides /Users/yuan/.emacs.d/ranch/matlab/matlab-netshell /Users/yuan/.emacs.d/ranch/matlab-emacs/cedet-matlab hides /Users/yuan/.emacs.d/ranch/matlab/cedet-matlab /Users/yuan/.emacs.d/ranch/matlab-emacs/matlab-mode-pkg hides /Users/yuan/.emacs.d/ranch/matlab/matlab-mode-pkg /Users/yuan/.emacs.d/ranch/iscroll/iscroll hides /Users/yuan/.emacs.d/elpa/iscroll-20210128.1938/iscroll /Users/yuan/.emacs.d/elpa/transient-20211105.100/transient hides /Users/yuan/emacs-head/nextstep/Emacs.app/Contents/Resources/lisp/transient Features: (shadow mail-extr emacsbug sendmail hexl ediff ediff-merg ediff-mult ediff-wind ediff-diff ediff-help ediff-init ediff-util tunes sql vundo-test thai-util thai-word deft vuiet mpv bind-key versuri lastfm elquery memoize esqlite esxml-query anaphora request pcsv org-timer org-clock cal-move valign org-tempo tempo whitespace olivetti org-element avl-tree generator ol-eww eww xdg url-queue mm-url ol-rmail ol-mhe ol-irc ol-info ol-gnus nnselect gnus-art mm-uu mml2015 mm-view mml-smime smime dig gnus-sum shr pixel-fill url-file url-dired gnus-group gnus-undo gnus-start gnus-dbus gnus-cloud nnimap nnmail mail-source utf7 netrc nnoo gnus-spec gnus-int gnus-range gnus-win gnus nnheader range ol-docview doc-view ol-bibtex ol-bbdb ol-w3m ol-doi org-link-doi emms-player-vlc emms-player-mpv emms-player-mplayer emms-setup emms-librefm-stream emms-playlist-limit emms-i18n emms-history emms-score emms-stream-info emms-metaplaylist-mode emms-bookmarks emms-cue emms-mode-line-icon emms-browser emms-player-xine emms-player-mpd emms-lyrics emms-streams emms-show-all emms-tag-editor emms-tag-tracktag emms-mark emms-info-native emms-info-exiftool emms-info-tinytag emms-info-metaflac emms-info-opusinfo emms-info-ogginfo emms-info-mp3info tq emms-mode-line emms-url emms-librefm-scrobbler bindat sort emms-playlist-sort emms-last-played emms-volume emms-volume-sndioctl emms-volume-mixerctl emms-volume-pulse emms-volume-amixer emms-playlist-mode emms-source-playlist emms-source-file locate emms-cache emms-playing-time emms-player-simple emms-info emms-later-do emms emms-compat tar-mode arc-mode archive-mode finder skeleton autoinsert ftable cl-print consult-imenu vundo markdown-mode tramp-archive tramp-gvfs dbus tramp tramp-loaddefs trampver tramp-integration files-x tramp-compat parse-time ls-lisp cc-mode-expansions cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs goggles pulse mm-archive gnutls network-stream url-http url-gw nsm url-cache url-auth bklink xapian-lite flique xeft bug-reference iimg quanjiao magit-extras magit-imenu git-rebase magit-patch-changelog magit-patch magit-bookmark magit-submodule magit-obsolete magit-blame magit-stash magit-reflog magit-bisect magit-push magit-pull magit-fetch magit-clone magit-remote magit-commit magit-sequence magit-notes magit-worktree magit-tag magit-merge magit-branch magit-reset magit-files magit-refs magit-status magit magit-repos magit-apply magit-wip magit-log magit-diff smerge-mode diff git-commit log-edit message yank-media rmc puny rfc822 mml mml-sec epa epg rfc6068 epg-config gnus-util mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr mailabbrev mail-utils gmm-utils mailheader magit-core magit-autorevert magit-margin magit-transient magit-process with-editor shell magit-mode transient edmacro magit-git magit-section magit-utils stimmung-themes-light-theme dired-aux autorevert dired+ misearch multi-isearch view tildify table rst rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-util nxml-enc xmltok refer refer-to-bibtex refbib printing picture nroff-mode enriched disp-table facemenu ebnf2ps ps-print ps-print-loaddefs ps-def lpr delim-col rect bib-mode cus-theme eieio-custom xwidget image-mode dired dired-loaddefs exif wid-browse autoload checkdoc lisp-mnt char-fold server descr-text cus-start face-remap add-log keyfreq form-feed diff-hl log-view pcvs-util vc-dir hideshow after-save undo-hl company-dabbrev-code company-dabbrev company-files company-capf vc-mtn vc-hg vc-git diff-mode vc-bzr vc-src vc-sccs vc-svn vc-cvs vc-rcs vc vc-dispatcher cus-edit pale-theme pixel-scroll so-long cus-load kinsoku jka-compr cyberpunk-theme light-theme theme-util exec-path-from-shell svg dom xml ghelp ghelp-eglot ghelp-helpful ghelp-builtin derived color-outline pause utility transform no-littering selectrum-prescient prescient consult-selectrum selectrum minibuf-eldef crm consult kmacro bookmark which-func yasnippet eglot array filenotify jsonrpc ert pp ewoc debug xref flymake-proc flymake warnings compile text-property-search project flycheck flyspell ispell expand-region text-mode-expansions the-org-mode-expansions er-basic-expansions thingatpt expand-region-core expand-region-custom ws-butler minions savehist buffer-move windmove hl-todo highlight-parentheses rainbow-delimiters elec-pair winner aggressive-indent swiper cl-extra ivy delsel ivy-faces ivy-overlay colir color recentf-ext recentf tree-widget wid-edit which-key company helpful imenu trace edebug backtrace info-look f elisp-refs s dash org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-footnote org-src ob-comint org-pcomplete pcomplete comint ansi-color org-list org-faces org-entities noutline outline org-version ob-emacs-lisp ob-core ob-eval org-table oc-basic bibtex iso8601 time-date ol org-keys oc org-compat advice org-macs org-loaddefs format-spec find-func cal-menu calendar cal-loaddefs finder-inf tex-site geiser-impl help-fns radix-tree help-mode geiser-custom geiser-base ring proof-site proof-autoloads rx info lunary luna-key easy-mmode luna-load-package pcase cowboy luna-f package browse-url url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json map url-vars seq gv subr-x byte-opt bytecomp byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/ns-win ns-win ucs-normalize mule-util term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice button loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads dbusbind kqueue cocoa ns lcms2 multi-tty make-network-process emacs) Memory information: ((conses 16 1458806 1449475) (symbols 48 50357 132) (strings 32 263811 152581) (string-bytes 1 8256315) (vectors 16 98731) (vector-slots 8 2508183 1231269) (floats 8 937 5080) (intervals 56 105594 39459) (buffers 992 134)) ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-27 5:49 bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string Yuan Fu @ 2022-03-27 6:40 ` Eli Zaretskii 2022-03-27 12:04 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-03-29 14:38 ` Lars Ingebrigtsen 0 siblings, 2 replies; 26+ messages in thread From: Eli Zaretskii @ 2022-03-27 6:40 UTC (permalink / raw) To: Yuan Fu; +Cc: 54591 > From: Yuan Fu <casouri@gmail.com> > Date: Sat, 26 Mar 2022 22:49:21 -0700 > > > I’m trying make use of sqlite functionalities added recently and found > that the query result of blob type entries are multibyte string rather > than unibyte. That caused issues when I tried to create image out of the > query result: the image appears to be broken, even though the data is > good. Adding string-as-unibyte solves the issue, but I guess the > return value of a blob entry is probably best to be unibyte by default? If we want to support non-text data in a blob, then yes, I think you are right. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-27 6:40 ` Eli Zaretskii @ 2022-03-27 12:04 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-03-27 14:56 ` Eli Zaretskii 2022-03-29 14:38 ` Lars Ingebrigtsen 1 sibling, 1 reply; 26+ messages in thread From: Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-27 12:04 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, Yuan Fu Eli Zaretskii <eliz@gnu.org> writes: > If we want to support non-text data in a blob, then yes, I think you > are right. BTW, I thought things can go awry quickly if misformed sequences find their way into a multibyte string, so any data which isn't decoded or guaranteed to be valid in Emacs's internal superset of Unicode shouldn't be allowed to get into in a multibyte string. Is that correct? Thanks in advance. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-27 12:04 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-27 14:56 ` Eli Zaretskii 0 siblings, 0 replies; 26+ messages in thread From: Eli Zaretskii @ 2022-03-27 14:56 UTC (permalink / raw) To: Po Lu; +Cc: 54591, casouri > From: Po Lu <luangruo@yahoo.com> > Cc: Yuan Fu <casouri@gmail.com>, 54591@debbugs.gnu.org > Date: Sun, 27 Mar 2022 20:04:41 +0800 > > Eli Zaretskii <eliz@gnu.org> writes: > > > If we want to support non-text data in a blob, then yes, I think you > > are right. > > BTW, I thought things can go awry quickly if misformed sequences find > their way into a multibyte string, so any data which isn't decoded or > guaranteed to be valid in Emacs's internal superset of Unicode shouldn't > be allowed to get into in a multibyte string. > > Is that correct? No. Emacs converts any invalid byte sequences to raw bytes, which have a special internal representation that cannot harm us. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-27 6:40 ` Eli Zaretskii 2022-03-27 12:04 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2022-03-29 14:38 ` Lars Ingebrigtsen 2022-03-29 15:38 ` Eli Zaretskii 1 sibling, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-03-29 14:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, Yuan Fu Eli Zaretskii <eliz@gnu.org> writes: >> I’m trying make use of sqlite functionalities added recently and found >> that the query result of blob type entries are multibyte string rather >> than unibyte. That caused issues when I tried to create image out of the >> query result: the image appears to be broken, even though the data is >> good. Adding string-as-unibyte solves the issue, but I guess the >> return value of a blob entry is probably best to be unibyte by default? > > If we want to support non-text data in a blob, then yes, I think you > are right. I'm not sure there's any way to do that reliably. If we change blob selections to not decode, then if you have create table if not exists test7 (col1 text, col2 blob) and then (format "insert into test7 values ('foó', '%s')" binary) will have to encode the entire string in utf-8, meaning you get garbage back when you select. (While today you get encoded binary back.) And if just say "well, don't do that, then; use prepared statements", then "insert into test7 values (?, ?)" ["foo" binary] could work... if we knew the type of the columns here, but I can't find any way to determine that on a statement that has been executed yet (sqlite3_column_type/sqlite3_column_decltype only seems to work on selects; perhaps I'm missing something -- anybody know?) We could then say "well, don't encode unibyte strings", but then we might be inserting that, unencoded, into a text column, and we'd be decoding when we get back, leading to more garbage. Finally, we could introduce a new syntax, like: "insert into test7 values (?, ?)" ["foo" (:binary binary)] or something, but... yuck. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-29 14:38 ` Lars Ingebrigtsen @ 2022-03-29 15:38 ` Eli Zaretskii 2022-03-31 11:54 ` Lars Ingebrigtsen 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-03-29 15:38 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 54591, casouri > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Yuan Fu <casouri@gmail.com>, 54591@debbugs.gnu.org > Date: Tue, 29 Mar 2022 16:38:43 +0200 > > > If we want to support non-text data in a blob, then yes, I think you > > are right. > > I'm not sure there's any way to do that reliably. If we change blob > selections to not decode, then if you have > > create table if not exists test7 (col1 text, col2 blob) > > and then > > (format "insert into test7 values ('foó', '%s')" binary) > > will have to encode the entire string in utf-8, meaning you get garbage > back when you select. (While today you get encoded binary back.) > > And if just say "well, don't do that, then; use prepared statements", > then > > "insert into test7 values (?, ?)" ["foo" binary] > > could work... if we knew the type of the columns here, but I can't find > any way to determine that on a statement that has been executed yet > (sqlite3_column_type/sqlite3_column_decltype only seems to work on > selects; perhaps I'm missing something -- anybody know?) Can we place the burden of knowing the type of the column on the caller? I mean, if they know it's a binary blob, they should tell us, and then we don't decode it? (Of course, if there's a way of knowing the type, we could DTRT automatically for each type.) ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-29 15:38 ` Eli Zaretskii @ 2022-03-31 11:54 ` Lars Ingebrigtsen 2022-04-01 18:34 ` Johannes Grødem 2022-04-29 4:59 ` Yuan Fu 0 siblings, 2 replies; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-03-31 11:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, casouri Eli Zaretskii <eliz@gnu.org> writes: > Can we place the burden of knowing the type of the column on the > caller? I mean, if they know it's a binary blob, they should tell us, > and then we don't decode it? We could -- that was my last example: >> "insert into test7 values (?, ?)" ["foo" (:binary binary)] Or some syntax like that. (And then never decode when we select from a BLOB column.) But... > (Of course, if there's a way of knowing the type, we could DTRT > automatically for each type.) Yes, I'm hoping somebody else could have a look at the sqlite documentation and tell me I'm missing some obvious way to get that information. :-) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-31 11:54 ` Lars Ingebrigtsen @ 2022-04-01 18:34 ` Johannes Grødem 2022-04-02 5:31 ` Eli Zaretskii ` (2 more replies) 2022-04-29 4:59 ` Yuan Fu 1 sibling, 3 replies; 26+ messages in thread From: Johannes Grødem @ 2022-04-01 18:34 UTC (permalink / raw) To: 54591 Lars Ingebrigtsen <larsi@gnus.org> writes: >> (Of course, if there's a way of knowing the type, we could DTRT >> automatically for each type.) > > Yes, I'm hoping somebody else could have a look at the sqlite > documentation and tell me I'm missing some obvious way to get that > information. :-) I might be misunderstanding the issue, but SQLite column types are more like documentation than actual rules to be enforced, unless STRICT tables are enabled. https://www.sqlite.org/datatype3.html In Python's SQLite3 API, ordinary strings are inserted as SQLite TEXTs and bytestrings are inserted as SQLite BLOBs, and returned as the same when you query. It's using a function similar to row_to_value for this, except that it for some reason doesn't switch on SQLITE_BLOB, it just calls sqlite3_column_blob and tests if it returns NULL or not and then it does a switch on the remaining types. Couldn't the Emacs interface just use vectors of byte values for BLOBs both ways? Or is there another data type that makes more sense? By the way, if you want to insert BLOBs in the query itself you can do it like this, but I guess this doesn't need Emacs support, except maybe a helper function for the conversion: INSERT INTO foo VALUES (X'deadcafe'); ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-01 18:34 ` Johannes Grødem @ 2022-04-02 5:31 ` Eli Zaretskii 2022-04-02 6:33 ` Johannes Grødem 2022-04-02 12:59 ` Lars Ingebrigtsen 2022-04-02 14:06 ` Lars Ingebrigtsen 2 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-04-02 5:31 UTC (permalink / raw) To: Johannes Grødem; +Cc: 54591 > From: Johannes Grødem <fjas@grdm.no> > Date: Fri, 01 Apr 2022 20:34:49 +0200 > > In Python's SQLite3 API, ordinary strings are inserted as SQLite TEXTs > and bytestrings are inserted as SQLite BLOBs, and returned as the same > when you query. It's using a function similar to row_to_value for this, > except that it for some reason doesn't switch on SQLITE_BLOB, it just > calls sqlite3_column_blob and tests if it returns NULL or not and then > it does a switch on the remaining types. This would be leaving the responsibility for the issue to the caller, I think. Does SQLite TEXT allow the superset of UTF-8 encoding Emacs uses internally to store characters that are not in Unicode? If it does, we could indeed assume that any BLOB is binary data and not attempt encoding/decoding it. > Couldn't the Emacs interface just use vectors of byte values for BLOBs > both ways? Why? Unibyte strings are easier and more flexible in Emacs. Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 5:31 ` Eli Zaretskii @ 2022-04-02 6:33 ` Johannes Grødem 2022-04-02 6:52 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Johannes Grødem @ 2022-04-02 6:33 UTC (permalink / raw) To: 54591 Eli Zaretskii <eliz@gnu.org> writes: > This would be leaving the responsibility for the issue to the caller, > I think. I think that makes sense. > Does SQLite TEXT allow the superset of UTF-8 encoding Emacs uses > internally to store characters that are not in Unicode? If it does, we > could indeed assume that any BLOB is binary data and not attempt > encoding/decoding it. SQLite documentation says this... TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE). ...but it's still possible to store byte sequences that are not legal Unicode in there. This breaks the mentioned Python SQLite3 API, and possibly others, so maybe not great if someone wants to read tables from something else than Emacs. Python-SQLite3 will attempt to decode it as UTF8 unless cast as a blob: SELECT CAST(bad_text AS blob) FROM foo (It works with the sqlite3 CLI, though.) >> Couldn't the Emacs interface just use vectors of byte values for BLOBs >> both ways? > Why? Unibyte strings are easier and more flexible in Emacs. Sorry, just my lack of knowledge of Emacs types. Unibyte strings seem fine for this. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 6:33 ` Johannes Grødem @ 2022-04-02 6:52 ` Eli Zaretskii 0 siblings, 0 replies; 26+ messages in thread From: Eli Zaretskii @ 2022-04-02 6:52 UTC (permalink / raw) To: Johannes Grødem; +Cc: 54591 > From: Johannes Grødem <fjas@grdm.no> > Date: Sat, 02 Apr 2022 08:33:55 +0200 > > > Does SQLite TEXT allow the superset of UTF-8 encoding Emacs uses > > internally to store characters that are not in Unicode? If it does, we > > could indeed assume that any BLOB is binary data and not attempt > > encoding/decoding it. > > SQLite documentation says this... > > TEXT. The value is a text string, stored using the database encoding > (UTF-8, UTF-16BE or UTF-16LE). > > ...but it's still possible to store byte sequences that are not legal > Unicode in there. This breaks the mentioned Python SQLite3 API, and > possibly others, so maybe not great if someone wants to read tables from > something else than Emacs. This probably means we should reject text with raw bytes or characters whose codepoints are beyond #x10FFFF, and document that those should be encoded manually and stored as BLOBs. Thanks. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-01 18:34 ` Johannes Grødem 2022-04-02 5:31 ` Eli Zaretskii @ 2022-04-02 12:59 ` Lars Ingebrigtsen 2022-04-02 13:51 ` Eli Zaretskii 2022-04-02 14:06 ` Lars Ingebrigtsen 2 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-02 12:59 UTC (permalink / raw) To: Johannes Grødem; +Cc: 54591 Johannes Grødem <fjas@grdm.no> writes: > I might be misunderstanding the issue, but SQLite column types are more > like documentation than actual rules to be enforced, unless STRICT > tables are enabled. Yeah, you can put anything you want into TEXT and BLOB columns. What I'd like to see happening is that the Emacs interface here is predictable and convenient, and that makes my brain hurt a bit here. Let's take a TEXT column first. Currently, if you have the multibyte string "fóo" and insert with "insert into ... (?)", we encode to utf-8 and put the bytes #x66#xc3#xb3#x6f into the database. Selecting from the database, we get the bytes #x66#xc3#xb3#x6f back, decode and return the string "fóo". If you have a unibyte string containing the bytes #x66#xc3#xb3#x6f, we don't do anything with that, but insert the bytes as is. When selecting, we decode and return "fóo", which is not what the user inserted. In this case, it would be nice to signal an error, but we can't, because we don't know that it's a TEXT column in the first place. Conversely, with BLOB columns, we would prefer to signal an error on multibyte strings, but we can't, because we don't know that it's a BLOB column. But we do the right thing with unibyte strings -- if you give it #x66#xc3#xb3#x6f, it'll put those bytes into the BLOB column, and when selecting, we do know that it's a BLOB column, so we could return the unibyte string #x66#xc3#xb3#x6f, and everything's fine. However, if the user wanted to insert the string "fóo", they'll be getting #x66#xc3#xb3#x6f back and will probably be sad. Today, the semantics are at least predictable: We insert everything encoded to utf-8 (no matter whether using bound parameters or inside the string), and if the user wanted something binary in the BLOB they selected, they just have to call `decode-coding-string BLOB-RESULT 'utf-8' to get the binary data. Which I understand is confusing, because it's very confusing indeed. But it's consistent, at least. If we knew what the type of the column we were inserting into, we could be more helpful in the interface, but there doesn't seem to be a way to get at that information? > By the way, if you want to insert BLOBs in the query itself you can do > it like this, but I guess this doesn't need Emacs support, except maybe > a helper function for the conversion: > > INSERT INTO foo VALUES (X'deadcafe'); Yes, but that leaves the issue to the caller, and the issue about what to do when selecting is still unclear. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 12:59 ` Lars Ingebrigtsen @ 2022-04-02 13:51 ` Eli Zaretskii 2022-04-02 13:59 ` Lars Ingebrigtsen 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-04-02 13:51 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 54591, fjas > From: Lars Ingebrigtsen <larsi@gnus.org> > Date: Sat, 02 Apr 2022 14:59:21 +0200 > Cc: 54591@debbugs.gnu.org > > Let's take a TEXT column first. Currently, if you have the multibyte > string "fóo" and insert with "insert into ... (?)", we encode to utf-8 > and put the bytes #x66#xc3#xb3#x6f into the database. Selecting from > the database, we get the bytes #x66#xc3#xb3#x6f back, decode and return > the string "fóo". > > If you have a unibyte string containing the bytes #x66#xc3#xb3#x6f, we > don't do anything with that, but insert the bytes as is. When > selecting, we decode and return "fóo", which is not what the user > inserted. In this case, it would be nice to signal an error, but we > can't, because we don't know that it's a TEXT column in the first place. We could store unibyte strings as BLOBs, couldn't we? > Conversely, with BLOB columns, we would prefer to signal an error on > multibyte strings, but we can't, because we don't know that it's a BLOB > column. But we do the right thing with unibyte strings -- if you give > it #x66#xc3#xb3#x6f, it'll put those bytes into the BLOB column, and > when selecting, we do know that it's a BLOB column, so we could return > the unibyte string #x66#xc3#xb3#x6f, and everything's fine. However, if > the user wanted to insert the string "fóo", they'll be getting > #x66#xc3#xb3#x6f back and will probably be sad. We could refrain from decoding BLOBs, couldn't we? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 13:51 ` Eli Zaretskii @ 2022-04-02 13:59 ` Lars Ingebrigtsen 2022-04-02 14:22 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-02 13:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, fjas Eli Zaretskii <eliz@gnu.org> writes: > We could store unibyte strings as BLOBs, couldn't we? As I've said, we don't know whether a column is a TEXT or a BLOB column. And "store as BLOB" has no particular meaning -- internally in sqlite, there is no particular difference: Both are just a series of bytes, and you can put any bytes you want into the columns. > We could refrain from decoding BLOBs, couldn't we? Yes, I think I said so? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 13:59 ` Lars Ingebrigtsen @ 2022-04-02 14:22 ` Eli Zaretskii 2022-04-02 14:38 ` Lars Ingebrigtsen 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-04-02 14:22 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 54591, fjas > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: fjas@grdm.no, 54591@debbugs.gnu.org > Date: Sat, 02 Apr 2022 15:59:28 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > We could store unibyte strings as BLOBs, couldn't we? > > As I've said, we don't know whether a column is a TEXT or a BLOB column. When storing? we could ask the caller to specify that, or automatically consider unibyte non-ASCII strings as BLOBs. > And "store as BLOB" has no particular meaning -- internally in sqlite, > there is no particular difference: Both are just a series of bytes, and > you can put any bytes you want into the columns. Of course; the question is how those bytes are interpreted. The Python examples shown up-thread show the way, I think. ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 14:22 ` Eli Zaretskii @ 2022-04-02 14:38 ` Lars Ingebrigtsen 2022-04-02 15:26 ` Eli Zaretskii 0 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-02 14:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, fjas Eli Zaretskii <eliz@gnu.org> writes: > When storing? we could ask the caller to specify that, That would be inconvenient in general. > or automatically consider unibyte non-ASCII strings as BLOBs. My exegesis upthread explained why that's problematic. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 14:38 ` Lars Ingebrigtsen @ 2022-04-02 15:26 ` Eli Zaretskii 2022-04-02 15:28 ` Lars Ingebrigtsen 0 siblings, 1 reply; 26+ messages in thread From: Eli Zaretskii @ 2022-04-02 15:26 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 54591, fjas > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: fjas@grdm.no, 54591@debbugs.gnu.org > Date: Sat, 02 Apr 2022 16:38:19 +0200 > > Eli Zaretskii <eliz@gnu.org> writes: > > > When storing? we could ask the caller to specify that, > > That would be inconvenient in general. > > > or automatically consider unibyte non-ASCII strings as BLOBs. > > My exegesis upthread explained why that's problematic. So you think this problems has no solution, none at all? ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 15:26 ` Eli Zaretskii @ 2022-04-02 15:28 ` Lars Ingebrigtsen 2022-04-02 15:40 ` Lars Ingebrigtsen 0 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-02 15:28 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, fjas Eli Zaretskii <eliz@gnu.org> writes: > So you think this problems has no solution, none at all? No, I posted a later post which has the solution I'm going to implement. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 15:28 ` Lars Ingebrigtsen @ 2022-04-02 15:40 ` Lars Ingebrigtsen 2022-04-03 10:42 ` Rudolf Schlatte 0 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-02 15:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 54591, fjas Lars Ingebrigtsen <larsi@gnus.org> writes: > No, I posted a later post which has the solution I'm going to implement. (Which is basically a tweak on one of the solutions I proposed in my first message in this thread (with the comment "yuck"). I'm still waiting for somebody to correct me that I'm reading the sqlite documentation wrong and that you can actually find out the column type you're inserting into, because it seems rather absurd that that functionality doesn't exist.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 15:40 ` Lars Ingebrigtsen @ 2022-04-03 10:42 ` Rudolf Schlatte 2022-04-03 11:43 ` Lars Ingebrigtsen 0 siblings, 1 reply; 26+ messages in thread From: Rudolf Schlatte @ 2022-04-03 10:42 UTC (permalink / raw) To: 54591 Lars Ingebrigtsen <larsi@gnus.org> writes: > Lars Ingebrigtsen <larsi@gnus.org> writes: > >> No, I posted a later post which has the solution I'm going to implement. > > (Which is basically a tweak on one of the solutions I proposed in my > first message in this thread (with the comment "yuck"). I'm still > waiting for somebody to correct me that I'm reading the sqlite > documentation wrong and that you can actually find out the column type > you're inserting into, because it seems rather absurd that that > functionality doesn't exist.) Does the transcript below do what you're looking for? $ sqlite3 SQLite version 3.37.0 2021-12-09 01:34:53 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create table foo(x text not null, y blob not null); sqlite> select * from sqlite_master; table|foo|foo|2|CREATE TABLE foo(x text not null, y blob not null) sqlite> .schema sqlite_master CREATE TABLE sqlite_master ( type text, name text, tbl_name text, rootpage integer, sql text ); sqlite> ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-03 10:42 ` Rudolf Schlatte @ 2022-04-03 11:43 ` Lars Ingebrigtsen 0 siblings, 0 replies; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-03 11:43 UTC (permalink / raw) To: Rudolf Schlatte; +Cc: 54591 Rudolf Schlatte <rudi@constantly.at> writes: > Does the transcript below do what you're looking for? Nope. I need to know the type of the Xth column in a prepared statement when inserting. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-01 18:34 ` Johannes Grødem 2022-04-02 5:31 ` Eli Zaretskii 2022-04-02 12:59 ` Lars Ingebrigtsen @ 2022-04-02 14:06 ` Lars Ingebrigtsen 2022-04-28 12:58 ` Lars Ingebrigtsen 2 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-02 14:06 UTC (permalink / raw) To: Johannes Grødem; +Cc: 54591 But I forgot to say what I think we should do. :-) I think we should just leave it to the user, and I think the interface that makes the most sense is something like "insert into ... (?)" (... (propertize binary-data 'coding-system 'binary)) and then signal an error if binary-data isn't unibyte when the string has this property. And then, conversely, don't decode BLOB columns on selects. I think that'll be the most interoperable and convenient we'll be able to get. (And, of course, document this mess.) -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-02 14:06 ` Lars Ingebrigtsen @ 2022-04-28 12:58 ` Lars Ingebrigtsen 0 siblings, 0 replies; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-28 12:58 UTC (permalink / raw) To: Johannes Grødem; +Cc: 54591 Lars Ingebrigtsen <larsi@gnus.org> writes: > I think we should just leave it to the user, and I think the interface > that makes the most sense is something like > > "insert into ... (?)" (... (propertize binary-data 'coding-system 'binary)) > > and then signal an error if binary-data isn't unibyte when the string > has this property. And then, conversely, don't decode BLOB columns on > selects. I think that'll be the most interoperable and convenient we'll > be able to get. I've now done this on the trunk. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-03-31 11:54 ` Lars Ingebrigtsen 2022-04-01 18:34 ` Johannes Grødem @ 2022-04-29 4:59 ` Yuan Fu 2022-04-29 10:04 ` Lars Ingebrigtsen 1 sibling, 1 reply; 26+ messages in thread From: Yuan Fu @ 2022-04-29 4:59 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 54591 > On Mar 31, 2022, at 4:54 AM, Lars Ingebrigtsen <larsi@gnus.org> wrote: > > Eli Zaretskii <eliz@gnu.org> writes: > >> Can we place the burden of knowing the type of the column on the >> caller? I mean, if they know it's a binary blob, they should tell us, >> and then we don't decode it? > > We could -- that was my last example: > >>> "insert into test7 values (?, ?)" ["foo" (:binary binary)] > > Or some syntax like that. (And then never decode when we select from a > BLOB column.) But... > >> (Of course, if there's a way of knowing the type, we could DTRT >> automatically for each type.) > > Yes, I'm hoping somebody else could have a look at the sqlite > documentation and tell me I'm missing some obvious way to get that > information. :-) I had a quick look at the log and didn’t see anything related to sql. So what do we end up doing? I think at least a notice could be added to the manual? Something like “Use (encode-coding-string xxx 'raw-text)) if you want to get out a binary blob”. Yuan ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-29 4:59 ` Yuan Fu @ 2022-04-29 10:04 ` Lars Ingebrigtsen 2022-04-30 5:27 ` Yuan Fu 0 siblings, 1 reply; 26+ messages in thread From: Lars Ingebrigtsen @ 2022-04-29 10:04 UTC (permalink / raw) To: Yuan Fu; +Cc: 54591 Yuan Fu <casouri@gmail.com> writes: > I had a quick look at the log and didn’t see anything related to > sql. Which log? * | 5d032f2904..: Lars Ingebrigtsen 2022-04-28 Allow inserting and selecting binary blobs from sqlite -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 26+ messages in thread
* bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string 2022-04-29 10:04 ` Lars Ingebrigtsen @ 2022-04-30 5:27 ` Yuan Fu 0 siblings, 0 replies; 26+ messages in thread From: Yuan Fu @ 2022-04-30 5:27 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: 54591 > On Apr 29, 2022, at 3:04 AM, Lars Ingebrigtsen <larsi@gnus.org> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >> I had a quick look at the log and didn’t see anything related to >> sql. > > Which log? > > * | 5d032f2904..: Lars Ingebrigtsen 2022-04-28 Allow inserting and selecting binary blobs from sqlite > > -- > (domestic pets only, the antidote for overdose, milk.) > bloggy blog: http://lars.ingebrigtsen.no Never mind :-) Yuan ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2022-04-30 5:27 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-03-27 5:49 bug#54591: 29.0.50; sqlite-select returns blob result as multibyte string Yuan Fu 2022-03-27 6:40 ` Eli Zaretskii 2022-03-27 12:04 ` Po Lu via Bug reports for GNU Emacs, the Swiss army knife of text editors 2022-03-27 14:56 ` Eli Zaretskii 2022-03-29 14:38 ` Lars Ingebrigtsen 2022-03-29 15:38 ` Eli Zaretskii 2022-03-31 11:54 ` Lars Ingebrigtsen 2022-04-01 18:34 ` Johannes Grødem 2022-04-02 5:31 ` Eli Zaretskii 2022-04-02 6:33 ` Johannes Grødem 2022-04-02 6:52 ` Eli Zaretskii 2022-04-02 12:59 ` Lars Ingebrigtsen 2022-04-02 13:51 ` Eli Zaretskii 2022-04-02 13:59 ` Lars Ingebrigtsen 2022-04-02 14:22 ` Eli Zaretskii 2022-04-02 14:38 ` Lars Ingebrigtsen 2022-04-02 15:26 ` Eli Zaretskii 2022-04-02 15:28 ` Lars Ingebrigtsen 2022-04-02 15:40 ` Lars Ingebrigtsen 2022-04-03 10:42 ` Rudolf Schlatte 2022-04-03 11:43 ` Lars Ingebrigtsen 2022-04-02 14:06 ` Lars Ingebrigtsen 2022-04-28 12:58 ` Lars Ingebrigtsen 2022-04-29 4:59 ` Yuan Fu 2022-04-29 10:04 ` Lars Ingebrigtsen 2022-04-30 5:27 ` Yuan Fu
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).