From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: arthur miller Newsgroups: gmane.emacs.devel Subject: Sv: Emacs as a word processor (ways to convert Word/RTF proprietary files) Date: Fri, 25 Dec 2020 19:41:31 +0000 Message-ID: References: <0E591E8B-FD55-4829-8421-6F2C02AFD20C@mit.edu> <83eejenvy2.fsf@gnu.org> <87r1ne40e7.fsf@logand.com> , <87wnx6x7vc.fsf@logand.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="_000_AM0PR06MB6577E6F1C1E35DF90410B53596DC0AM0PR06MB6577eurp_" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36625"; mail-complaints-to="usenet@ciao.gmane.io" Cc: "emacs-devel@gnu.org" To: Tomas Hlavaty Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Dec 25 20:43:32 2020 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ksszq-0009QV-Vt for ged-emacs-devel@m.gmane-mx.org; Fri, 25 Dec 2020 20:43:31 +0100 Original-Received: from localhost ([::1]:45360 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ksszq-0005KO-1P for ged-emacs-devel@m.gmane-mx.org; Fri, 25 Dec 2020 14:43:30 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:60988) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kssy3-0003yI-B6 for emacs-devel@gnu.org; Fri, 25 Dec 2020 14:41:43 -0500 Original-Received: from mail-oln040092075067.outbound.protection.outlook.com ([40.92.75.67]:26769 helo=EUR04-VI1-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kssy0-00009s-7i for emacs-devel@gnu.org; Fri, 25 Dec 2020 14:41:38 -0500 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=StwaADUYXyjljOgVOT9JT2B0aX87wdiv+8uBLLefqo+c8LwOU5nXsCacMpJZTQoiQFfznVwnFlY55tw5CNvv+XJgDUJ2kWYwaL2w58jfDslIgRfgbLs+Ux+xQa2ddEW+wVqA4XMp4nas4r4ZbfSlOv8vv9VSnMrwjlJyXOYFkOJLBZ6mwPPC1c28MuTDEgpdw2F6OIT3T8m5zkF+zLmn/CiotPEcgEEbuOV2Psah4+k5ppti29G0h19jWLd5RTUdbJTntCI9dbZz6/e4aWvP/1CdiR6uPtQ4zY2dyDCdabtwBmRKP3SWVG4Ygrgeoz1swRQSxTNwKyJ4lH5944b6uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/sr142XOJs20ZnBUM6MpQ0Jo708dUudqXPRpakddP24=; b=EtNTOsXnaqCWOO9V+Wkr+xY48AWuWIJh7Ufl5DyRB3+nV3EFW9Dd+/uYmJCfcFUnI8iAHXSV4a11gRuCxqiQqRGOMgKCOKiTX9rYjk43/qzy4FEr1kwrf78y1/kODQH7m6Q8hvLWF69/0RW4MgTCdvIrzKHM713OeAPqKDy6PVSq7eZBCe0duPmlglZn5GlChzuojUMJ7LV8qY5tbcMCZQwOx+qXHxr6ceASmxEVSIsfBUWdFAToH/ryYdZIMtToUAWHT1Y/f+02gI46JQFSDQtVk9ZrvU5Ni0hKktGib6JkMWaT+lG0aPA2J60dpFu5zmdWeZ/tULAWtWC/RtYtWw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=live.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/sr142XOJs20ZnBUM6MpQ0Jo708dUudqXPRpakddP24=; b=P/I3FJzjntehqC+NaOyGoXhV9o2oFz/B/SyYfHDj1qxrwVM28zUcRIF51fRZyzYLKXwCHGfBtXzMyPScfLPg4+e13BjKP6v52M8m8mYuVEZ0/CF0sLlg588zR74dn8qHJKZfjJcXcs+6jLetK3mJGBB5wukIoep/xRjySwbwfx3hrZHW+lOBSSCkdl4c85r/yRyvlGUNeeW74DoI8l1UHYZI+NeCA7ukSNNGFrhsVaru26gccHuE4tXHaU5OFkWZkrRCWP+b/K63HHbaE5u+Rv72nzMQqYiZFycl1zmFYXZBiTMxBsQqg7/PJfiBTnmox1exESvF/MVdkVtzCxR92g== Original-Received: from DB3EUR04FT027.eop-eur04.prod.protection.outlook.com (2a01:111:e400:7e0c::45) by DB3EUR04HT013.eop-eur04.prod.protection.outlook.com (2a01:111:e400:7e0c::306) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.27; Fri, 25 Dec 2020 19:41:31 +0000 Original-Received: from AM0PR06MB6577.eurprd06.prod.outlook.com (2a01:111:e400:7e0c::51) by DB3EUR04FT027.mail.protection.outlook.com (2a01:111:e400:7e0c::122) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.27 via Frontend Transport; Fri, 25 Dec 2020 19:41:31 +0000 Original-Received: from AM0PR06MB6577.eurprd06.prod.outlook.com ([fe80::9487:8c7d:da00:4993]) by AM0PR06MB6577.eurprd06.prod.outlook.com ([fe80::9487:8c7d:da00:4993%7]) with mapi id 15.20.3700.029; Fri, 25 Dec 2020 19:41:31 +0000 Thread-Topic: Emacs as a word processor (ways to convert Word/RTF proprietary files) Thread-Index: AQHW2o6E0gIcye9ai0+o53ra9fwX6aoH4+eAgABMhvo= In-Reply-To: <87wnx6x7vc.fsf@logand.com> Accept-Language: sv-SE, en-US Content-Language: sv-SE x-incomingtopheadermarker: OriginalChecksum:8CA4B787AACD00D66CCCB7AAD4F68D267C9398762B1D7603CCE6F0F66FF3EB6E; UpperCasedChecksum:2B9618CDE4EA3E3B8D67FF308BD9B09C7A7C6FF07DBD342E38CB673F7C766BD8; SizeAsReceived:7639; Count:45 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [9yK0uApoJ4SBhNBc1+pmwYS4CppXO356] x-ms-publictraffictype: Email x-incomingheadercount: 45 x-eopattributedmessage: 0 x-ms-office365-filtering-correlation-id: 4594a5d4-a2d0-4815-b88a-08d8a90d1464 x-ms-traffictypediagnostic: DB3EUR04HT013: x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: b9oyzuf0okjlZTENFm35l3ZcRy9jpYci4afzp0LQpivgRy8fdoxrs8MyNoOI+NWav3y5HtjwHRl4Ro7kaBlwsRP/dwvyeSm5PHgRWHvJ4sF2yGDnxjNnxc7qZplAp5xWm5ru1oXRVlAHWmOJOh1NC4ahDRcNrqHeIDKmutCgbq2uq6Zi6G7S0HxAPY1eeO9CSsoiY3KPUfpzfmb5lqSp4P5h0OouF3VjV4mMnIQQULp39dVNMXnvMBGURSysrzRh x-ms-exchange-antispam-messagedata: LT5PhaGA6g3szgRVCdOOBLnwa8jiCz2NNZnj+YsKDZIh0nF7tZGcF3zeisUQXJKghvklYm/t0DQwGY3GPg7iyOtIE7lU+wsbkVsetRah7ZGYqdUnlS8t1PvI5KCWXUBIkE8OT1A3/juwYnMhUvp+yA== x-ms-exchange-transport-forked: True X-OriginatorOrg: live.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-AuthSource: DB3EUR04FT027.eop-eur04.prod.protection.outlook.com X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 4594a5d4-a2d0-4815-b88a-08d8a90d1464 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Dec 2020 19:41:31.4052 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3EUR04HT013 Received-SPF: pass client-ip=40.92.75.67; envelope-from=arthur.miller@live.com; helo=EUR04-VI1-obe.outbound.protection.outlook.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:261767 Archived-At: --_000_AM0PR06MB6577E6F1C1E35DF90410B53596DC0AM0PR06MB6577eurp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable > Just because something is a zip file with some xml files inside does not > make it "not hard", "just dealing with xml". It is complex to do > non-trivial stuff. If you do not see what I am talking about, try to > implement something non-trivial (for example merge many docx documents > into one). You'll understand why it is not a pleasant experience and > why I do not think anybody will do that in their free time. We obviously have different understanding on what is hard. I considering so= mething to be hard if there is not a well-known solution you can take and apply to = implement the functionality. Or if there are very many obscure details that have to = be taken into account. I don't consider working with standardized xml format to get into = that category. LibreOffice does quite good job of translating docx stuff you wro= te about. There is a difference between hard and labourus. I would say it is a lot of= work not very hard. You should be able to open docx files, parse them with elisp and display te= xt or even render graphics to svg. I did some similar long time ago with Java.I think it is a= lot of work, but not very interesting. I wouldnt say it is unpleasant, it is certainly much bett= er to work with a documented standardized ooxml then with some undocumented old format. I pro= bably wouldn't say unpleasant, just very boring. So if you are gonna go do it, please, it will be great thing if you impleme= nt it. Good luck . ________________________________ Fr=E5n: Tomas Hlavaty Skickat: den 25 december 2020 15:44 Till: Arthur Miller Kopia: emacs-devel@gnu.org =C4mne: Re: Emacs as a word processor (ways to convert Word/RTF proprietary= files) On Fri 25 Dec 2020 at 14:19, Arthur Miller wrote: > The problem with documents in MS office is not text extraction; it is > just xml nowadays anyway, the problem is countless VBA scripts that > business and organisations run in Excell/Access/Word that just can't > be translate to Libre. Libre has VB, but the underlaying objects are > not there and lots of tools out there that people use can't be just > automatically translated. > > I have worked in big organisation and did lots of automation for MS > office and databases. So what? I do not understand what are you trying to say. I tried to get the point across that it is not all or nothing problem. There are use-cases which bring lots of value and are achievable with reasonable effort. >> Dealing with office formats is not a pleasant experience so I am >> skeptical that volunteers will devote so much time to the use-cases >> with the highest complexity. > > What is not so pleasant? New formats (marked with x) at the end are > all xml, so it is just dealing with xml, sinilar to odt. I see nothing > hard there and it is not that I defend Microsoft, I just don't see > what you are talking about. That is part that alternatives you mention > do. Just because something is a zip file with some xml files inside does not make it "not hard", "just dealing with xml". It is complex to do non-trivial stuff. If you do not see what I am talking about, try to implement something non-trivial (for example merge many docx documents into one). You'll understand why it is not a pleasant experience and why I do not think anybody will do that in their free time. >> there could be. > > You are correct about one thing: there could be free alternative. > All that will probably change in next 20 ~ 30 years, but we are not > there yet. It is not clear to me about which use-case are you talking in this prediction. 1) There are use-cases, for which there are solutions now, as I already shown. 2) There are use-cases, for which solutions could be implemented with reasonable effort. 3) There are use-cases, which will very likely never have an alternative. For 1) I did my best. For 2) we'll see what I will do;-) For 3) I wish you good luck! --_000_AM0PR06MB6577E6F1C1E35DF90410B53596DC0AM0PR06MB6577eurp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
> Just because something= is a zip file with some xml files inside does not
> make it "not hard= ", "just dealing with xml".  It is complex to do
> non-trivial stuff.  If you do not see what I am talking about, tr= y to
> implement something non-trivial (for example merge many docx documents=
> into one).  You'll understand why it is not a pleasant experience= and
> why I do not think anybody will do that in their free time.

We obviously have different= understanding on what is hard. I considering something
to be hard if there is not = a well-known solution you can take and apply to implement
the functionality. Or if th= ere are very many  obscure details that have to be taken into
account. I don't consider w= orking with standardized xml format to get into that
category. LibreOffice does = quite good job of translating docx stuff you wrote about.

There is a difference betwe= en hard and labourus. I would say it is a lot of work not very
hard.

You should be able to open = docx files, parse them with elisp and display text or even render
graphics to svg. I did some= similar long time ago with Java.I think it is a lot of work, but not
very interesting. I wouldnt= say it is unpleasant, it is certainly much better to work with a
documented standardized oox= ml then with some undocumented old format. I probably
wouldn't say unpleasant, ju= st very boring.

So if you are gonna go do i= t, please, it will be great thing if you implement it.

Good luck .

Fr=E5n: Tomas Hlavaty <t= om@logand.com>
Skickat: den 25 december 2020 15:44
Till: Arthur Miller <arthur.miller@live.com>
Kopia: emacs-devel@gnu.org <emacs-devel@gnu.org>
=C4mne: Re: Emacs as a word processor (ways to convert Word/RTF prop= rietary files)
 
On Fri 25 Dec 2020 at 14:19, Arthur Miller <art= hur.miller@live.com> wrote:
> The problem with documents in MS office is not text extraction; it is<= br> > just xml nowadays anyway, the problem is countless VBA scripts that > business and organisations run in Excell/Access/Word that just can't > be translate to Libre. Libre has VB, but the underlaying objects are > not there and lots of tools out there that people use can't be just > automatically translated.
>
> I have worked in big organisation and did lots of automation for MS > office and databases.

So what?  I do not understand what are you trying to say.

I tried to get the point across that it is not all or nothing problem.
There are use-cases which bring lots of value and are achievable with
reasonable effort.

>> Dealing with office formats is not a pleasant experience so I am >> skeptical that volunteers will devote so much time to the use-case= s
>> with the highest complexity.
>
> What is not so pleasant? New formats (marked with x) at the end are > all xml, so it is just dealing with xml, sinilar to odt. I see nothing=
> hard there and it is not that I defend Microsoft, I just don't see
> what you are talking about. That is part that alternatives you mention=
> do.

Just because something is a zip file with some xml files inside does not make it "not hard", "just dealing with xml".  It i= s complex to do
non-trivial stuff.  If you do not see what I am talking about, try to<= br> implement something non-trivial (for example merge many docx documents
into one).  You'll understand why it is not a pleasant experience and<= br> why I do not think anybody will do that in their free time.

>>           &= nbsp;   there could be.
>
> You are correct about one thing: there could be free alternative.
> All that will probably change in next 20 ~ 30 years, but we are not > there yet.

It is not clear to me about which use-case are you talking in this
prediction.

1) There are use-cases, for which there are solutions now, as I already
   shown.

2) There are use-cases, for which solutions could be implemented with
   reasonable effort.

3) There are use-cases, which will very likely never have an
   alternative.

For 1) I did my best.

For 2) we'll see what I will do;-)

For 3) I wish you good luck!
--_000_AM0PR06MB6577E6F1C1E35DF90410B53596DC0AM0PR06MB6577eurp_--