unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* GeoIP database redistribution?
@ 2017-01-22 16:42 ng0
  2017-01-23 18:17 ` Marius Bakke
  0 siblings, 1 reply; 10+ messages in thread
From: ng0 @ 2017-01-22 16:42 UTC (permalink / raw)
  To: guix-devel

I want to slowly package OONI (https://ooni.torproject.org/).
One of its dependencies, txtorcon, requires python-geoip which
depends on geoip-c-api. I've got both covered, but both of them
want (either to download or to be present) for tests (a/the)
legacy database file of maxmind.

Question 1: Can we distribute the database in a source? I can't
access the homepage of maxmind for cloudflare reasons.

Question 2: If we can't distribute it, is it okay that I disabled the
tests and will leave it up to the user to run the `geoipupdate'
application, which I would need to package then.

Thanks!
-- 
♥Ⓐ  ng0 -- https://www.inventati.org/patternsinthechaos/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-22 16:42 GeoIP database redistribution? ng0
@ 2017-01-23 18:17 ` Marius Bakke
  2017-01-24  6:06   ` Pjotr Prins
  0 siblings, 1 reply; 10+ messages in thread
From: Marius Bakke @ 2017-01-23 18:17 UTC (permalink / raw)
  To: ng0, guix-devel

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

ng0 <contact.ng0@cryptolab.net> writes:

> I want to slowly package OONI (https://ooni.torproject.org/).
> One of its dependencies, txtorcon, requires python-geoip which
> depends on geoip-c-api. I've got both covered, but both of them
> want (either to download or to be present) for tests (a/the)
> legacy database file of maxmind.
>
> Question 1: Can we distribute the database in a source? I can't
> access the homepage of maxmind for cloudflare reasons.

The database is distributed freely under cc-by-sa4.0:

https://dev.maxmind.com/geoip/legacy/geolite/#License

So packaging it should be fine. :)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-23 18:17 ` Marius Bakke
@ 2017-01-24  6:06   ` Pjotr Prins
  2017-01-24  8:20     ` Efraim Flashner
  2017-01-24 21:25     ` Ludovic Courtès
  0 siblings, 2 replies; 10+ messages in thread
From: Pjotr Prins @ 2017-01-24  6:06 UTC (permalink / raw)
  To: Marius Bakke; +Cc: guix-devel

On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
> ng0 <contact.ng0@cryptolab.net> writes:
> 
> > I want to slowly package OONI (https://ooni.torproject.org/).
> > One of its dependencies, txtorcon, requires python-geoip which
> > depends on geoip-c-api. I've got both covered, but both of them
> > want (either to download or to be present) for tests (a/the)
> > legacy database file of maxmind.
> >
> > Question 1: Can we distribute the database in a source? I can't
> > access the homepage of maxmind for cloudflare reasons.
> 
> The database is distributed freely under cc-by-sa4.0:
> 
> https://dev.maxmind.com/geoip/legacy/geolite/#License
> 
> So packaging it should be fine. :)

This actually raises the issue of packaging large data files (we are
getting into TB's). Could there be a way Guix fetches external
datasets as part of the distribution? I think that if it is not
executable code and SHA values/pfff values match it would be safe to
do.

Idea? That is a first step towards reproducible analysis.

Pj.

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-24  6:06   ` Pjotr Prins
@ 2017-01-24  8:20     ` Efraim Flashner
  2017-01-24 11:01       ` ng0
  2017-01-24 21:24       ` Ludovic Courtès
  2017-01-24 21:25     ` Ludovic Courtès
  1 sibling, 2 replies; 10+ messages in thread
From: Efraim Flashner @ 2017-01-24  8:20 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Tue, Jan 24, 2017 at 06:06:21AM +0000, Pjotr Prins wrote:
> On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
> > ng0 <contact.ng0@cryptolab.net> writes:
> > 
> > > I want to slowly package OONI (https://ooni.torproject.org/).
> > > One of its dependencies, txtorcon, requires python-geoip which
> > > depends on geoip-c-api. I've got both covered, but both of them
> > > want (either to download or to be present) for tests (a/the)
> > > legacy database file of maxmind.
> > >
> > > Question 1: Can we distribute the database in a source? I can't
> > > access the homepage of maxmind for cloudflare reasons.
> > 
> > The database is distributed freely under cc-by-sa4.0:
> > 
> > https://dev.maxmind.com/geoip/legacy/geolite/#License
> > 
> > So packaging it should be fine. :)
> 
> This actually raises the issue of packaging large data files (we are
> getting into TB's). Could there be a way Guix fetches external
> datasets as part of the distribution? I think that if it is not
> executable code and SHA values/pfff values match it would be safe to
> do.
> 

The other thing is that guix downloads the files into ram and then
writes them into the store, which can be a problem if the source is
larger than the available memory.

-- 
Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-24  8:20     ` Efraim Flashner
@ 2017-01-24 11:01       ` ng0
  2017-11-28 16:19         ` ng0
  2017-01-24 21:24       ` Ludovic Courtès
  1 sibling, 1 reply; 10+ messages in thread
From: ng0 @ 2017-01-24 11:01 UTC (permalink / raw)
  To: Efraim Flashner; +Cc: guix-devel

Efraim Flashner <efraim@flashner.co.il> writes:

> On Tue, Jan 24, 2017 at 06:06:21AM +0000, Pjotr Prins wrote:
>> On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
>> > ng0 <contact.ng0@cryptolab.net> writes:
>> > 
>> > > I want to slowly package OONI (https://ooni.torproject.org/).
>> > > One of its dependencies, txtorcon, requires python-geoip which
>> > > depends on geoip-c-api. I've got both covered, but both of them
>> > > want (either to download or to be present) for tests (a/the)
>> > > legacy database file of maxmind.
>> > >
>> > > Question 1: Can we distribute the database in a source? I can't
>> > > access the homepage of maxmind for cloudflare reasons.
>> > 
>> > The database is distributed freely under cc-by-sa4.0:
>> > 
>> > https://dev.maxmind.com/geoip/legacy/geolite/#License
>> > 
>> > So packaging it should be fine. :)
>> 
>> This actually raises the issue of packaging large data files (we are
>> getting into TB's). Could there be a way Guix fetches external
>> datasets as part of the distribution? I think that if it is not
>> executable code and SHA values/pfff values match it would be safe to
>> do.
>> 
>
> The other thing is that guix downloads the files into ram and then
> writes them into the store, which can be a problem if the source is
> larger than the available memory.
>
> -- 
> Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
> GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
> Confidentiality cannot be guaranteed on emails sent or received unencrypted

TB? Oh. Due to cloudflare I had no clue how big this is. In this
case, for the current state of the way we distribute packages is
not itself using a distributed network (if just keeping the space
on hydra is the problem?), I would suggest that I package the
updater I mentioned and mention it in the description for other
MaxMind software.

This database is also an optional dependency for tor and not just
ooni if I understand it correctly, so even if it's not 100%
accurate (their commercial solution claims to be more accurate)
it would be nice to find a way to include it.
But I think size is really an issue, for example Gentoo dropped
the database from their source distribution network and points
users to the updater I mentioned (that's how I learned about the
updater).
-- 
♥Ⓐ  ng0 -- https://www.inventati.org/patternsinthechaos/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-24  8:20     ` Efraim Flashner
  2017-01-24 11:01       ` ng0
@ 2017-01-24 21:24       ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2017-01-24 21:24 UTC (permalink / raw)
  To: Efraim Flashner; +Cc: guix-devel

Efraim Flashner <efraim@flashner.co.il> skribis:

> On Tue, Jan 24, 2017 at 06:06:21AM +0000, Pjotr Prins wrote:
>> On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
>> > ng0 <contact.ng0@cryptolab.net> writes:
>> > 
>> > > I want to slowly package OONI (https://ooni.torproject.org/).
>> > > One of its dependencies, txtorcon, requires python-geoip which
>> > > depends on geoip-c-api. I've got both covered, but both of them
>> > > want (either to download or to be present) for tests (a/the)
>> > > legacy database file of maxmind.
>> > >
>> > > Question 1: Can we distribute the database in a source? I can't
>> > > access the homepage of maxmind for cloudflare reasons.
>> > 
>> > The database is distributed freely under cc-by-sa4.0:
>> > 
>> > https://dev.maxmind.com/geoip/legacy/geolite/#License
>> > 
>> > So packaging it should be fine. :)
>> 
>> This actually raises the issue of packaging large data files (we are
>> getting into TB's). Could there be a way Guix fetches external
>> datasets as part of the distribution? I think that if it is not
>> executable code and SHA values/pfff values match it would be safe to
>> do.
>> 
>
> The other thing is that guix downloads the files into ram and then
> writes them into the store, which can be a problem if the source is
> larger than the available memory.

It’s only if the file is added via the ‘add-to-store’ RPC, and not via
‘import-path’ or a substitute:

  https://bugs.gnu.org/23666

But yes, that can be a problem.

Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-24  6:06   ` Pjotr Prins
  2017-01-24  8:20     ` Efraim Flashner
@ 2017-01-24 21:25     ` Ludovic Courtès
  2017-01-24 21:54       ` Marius Bakke
  1 sibling, 1 reply; 10+ messages in thread
From: Ludovic Courtès @ 2017-01-24 21:25 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Pjotr Prins <pjotr.public12@thebird.nl> skribis:

> On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
>> ng0 <contact.ng0@cryptolab.net> writes:
>> 
>> > I want to slowly package OONI (https://ooni.torproject.org/).
>> > One of its dependencies, txtorcon, requires python-geoip which
>> > depends on geoip-c-api. I've got both covered, but both of them
>> > want (either to download or to be present) for tests (a/the)
>> > legacy database file of maxmind.
>> >
>> > Question 1: Can we distribute the database in a source? I can't
>> > access the homepage of maxmind for cloudflare reasons.
>> 
>> The database is distributed freely under cc-by-sa4.0:
>> 
>> https://dev.maxmind.com/geoip/legacy/geolite/#License
>> 
>> So packaging it should be fine. :)
>
> This actually raises the issue of packaging large data files (we are
> getting into TB's). Could there be a way Guix fetches external
> datasets as part of the distribution? I think that if it is not
> executable code and SHA values/pfff values match it would be safe to
> do.

If we do add something this big, we’ll have to make sure it’s not
substitutable and does not ever land on the build machines.

Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-24 21:25     ` Ludovic Courtès
@ 2017-01-24 21:54       ` Marius Bakke
  0 siblings, 0 replies; 10+ messages in thread
From: Marius Bakke @ 2017-01-24 21:54 UTC (permalink / raw)
  To: Ludovic Courtès, Pjotr Prins; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]

Ludovic Courtès <ludo@gnu.org> writes:

> Pjotr Prins <pjotr.public12@thebird.nl> skribis:
>
>> On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
>>> ng0 <contact.ng0@cryptolab.net> writes:
>>> 
>>> > I want to slowly package OONI (https://ooni.torproject.org/).
>>> > One of its dependencies, txtorcon, requires python-geoip which
>>> > depends on geoip-c-api. I've got both covered, but both of them
>>> > want (either to download or to be present) for tests (a/the)
>>> > legacy database file of maxmind.
>>> >
>>> > Question 1: Can we distribute the database in a source? I can't
>>> > access the homepage of maxmind for cloudflare reasons.
>>> 
>>> The database is distributed freely under cc-by-sa4.0:
>>> 
>>> https://dev.maxmind.com/geoip/legacy/geolite/#License
>>> 
>>> So packaging it should be fine. :)
>>
>> This actually raises the issue of packaging large data files (we are
>> getting into TB's). Could there be a way Guix fetches external
>> datasets as part of the distribution? I think that if it is not
>> executable code and SHA values/pfff values match it would be safe to
>> do.
>
> If we do add something this big, we’ll have to make sure it’s not
> substitutable and does not ever land on the build machines.

Just to note, I don't think Pjotr was referring to this particular
package (geolite database). I just checked, and the City DB is 19.8MiB
compressed and 104.1 MiB uncompressed. The Country DB is ~2MiB and the
IPv6 databases are tiny.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-01-24 11:01       ` ng0
@ 2017-11-28 16:19         ` ng0
  2017-11-28 17:46           ` Pjotr Prins
  0 siblings, 1 reply; 10+ messages in thread
From: ng0 @ 2017-11-28 16:19 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3289 bytes --]

Hi!

It's better to reply late than never. My latency on some tasks
is high.
For anyone reading the old thread and wondering WTF happened,
I've just re-read the thread and giving it shot soon as I want
to analyze some logs and we already have a Perl module that
wants this DB. I prefer Guix supplied solutions when I can make
use of it, so long text short nonsense: I'll package it.
Thanks for the provided feedback everyone.

And Pjotr, the mentioning of TB's without any reference to an
external project really threw me off. Just specify the "We" next time ;)

ng0 transcribed 2.4K bytes:
> Efraim Flashner <efraim@flashner.co.il> writes:
> 
> > On Tue, Jan 24, 2017 at 06:06:21AM +0000, Pjotr Prins wrote:
> >> On Mon, Jan 23, 2017 at 07:17:12PM +0100, Marius Bakke wrote:
> >> > ng0 <contact.ng0@cryptolab.net> writes:
> >> > 
> >> > > I want to slowly package OONI (https://ooni.torproject.org/).
> >> > > One of its dependencies, txtorcon, requires python-geoip which
> >> > > depends on geoip-c-api. I've got both covered, but both of them
> >> > > want (either to download or to be present) for tests (a/the)
> >> > > legacy database file of maxmind.
> >> > >
> >> > > Question 1: Can we distribute the database in a source? I can't
> >> > > access the homepage of maxmind for cloudflare reasons.
> >> > 
> >> > The database is distributed freely under cc-by-sa4.0:
> >> > 
> >> > https://dev.maxmind.com/geoip/legacy/geolite/#License
> >> > 
> >> > So packaging it should be fine. :)
> >> 
> >> This actually raises the issue of packaging large data files (we are
> >> getting into TB's). Could there be a way Guix fetches external
> >> datasets as part of the distribution? I think that if it is not
> >> executable code and SHA values/pfff values match it would be safe to
> >> do.
> >> 
> >
> > The other thing is that guix downloads the files into ram and then
> > writes them into the store, which can be a problem if the source is
> > larger than the available memory.
> >
> > -- 
> > Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
> > GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
> > Confidentiality cannot be guaranteed on emails sent or received unencrypted
> 
> TB? Oh. Due to cloudflare I had no clue how big this is. In this
> case, for the current state of the way we distribute packages is
> not itself using a distributed network (if just keeping the space
> on hydra is the problem?), I would suggest that I package the
> updater I mentioned and mention it in the description for other
> MaxMind software.
> 
> This database is also an optional dependency for tor and not just
> ooni if I understand it correctly, so even if it's not 100%
> accurate (their commercial solution claims to be more accurate)
> it would be nice to find a way to include it.
> But I think size is really an issue, for example Gentoo dropped
> the database from their source distribution network and points
> users to the updater I mentioned (that's how I learned about the
> updater).
> -- 
> ♥Ⓐ  ng0 -- https://www.inventati.org/patternsinthechaos/
> 

-- 
GnuPG: A88C8ADD129828D7EAC02E52E22F9BBFEE348588
GnuPG: https://c.n0.is/ng0_pubkeys/tree/keys
  WWW: https://n0.is

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: GeoIP database redistribution?
  2017-11-28 16:19         ` ng0
@ 2017-11-28 17:46           ` Pjotr Prins
  0 siblings, 0 replies; 10+ messages in thread
From: Pjotr Prins @ 2017-11-28 17:46 UTC (permalink / raw)
  To: guix-devel

On Tue, Nov 28, 2017 at 04:19:37PM +0000, ng0 wrote:
> And Pjotr, the mentioning of TB's without any reference to an
> external project really threw me off. Just specify the "We" next
> time ;)

Sorry. I am talking biological data ;).

Pj.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-28 17:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-22 16:42 GeoIP database redistribution? ng0
2017-01-23 18:17 ` Marius Bakke
2017-01-24  6:06   ` Pjotr Prins
2017-01-24  8:20     ` Efraim Flashner
2017-01-24 11:01       ` ng0
2017-11-28 16:19         ` ng0
2017-11-28 17:46           ` Pjotr Prins
2017-01-24 21:24       ` Ludovic Courtès
2017-01-24 21:25     ` Ludovic Courtès
2017-01-24 21:54       ` Marius Bakke

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).