all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: guix-devel <guix-devel@gnu.org>
Subject: Re: Web site news to move to Haunt
Date: Fri, 28 Oct 2016 01:33:38 +0200	[thread overview]
Message-ID: <87funh9xkd.fsf@gnu.org> (raw)
In-Reply-To: <87oa27y1iw.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 26 Oct 2016 10:09:43 +0200")

[-- Attachment #1: Type: text/plain, Size: 1654 bytes --]

Hello Guix!

ludo@gnu.org (Ludovic Courtès) skribis:

> The goal is to manage news using Haunt, and to have a page to display
> them on the web site, something nicer and more flexible than this:
>
>   https://savannah.gnu.org/news/?group=guix
>
> And I already enjoy Haunt!  :-)

I’m done!

Please visit the news pages and report anything wrong!  Also, please
test the feed.xml file; currently IceCat reports that it “does not
appear to have any style information associated with it”, and thus
doesn’t display it nicely.  Maybe Haunt omits to emit (ah ha!) the
doctype thingie?

Because Savannah’s feeds only include the 20 newest entries¹, Assaf
Gordon, who works on Savannah, kindly exported the raw Guix news
database for me (54 entries).

The database contains the input markup, not the XML that appears in the
feed.  Of course the markup is nothing like Markdown, CommonMark, etc.

So I considered writing a parser in Scheme, but that turned out to be
trickier than I thought, so I turned back to running Savane’s own parser
(PHP!).  Since I don’t have PHP on Guix, I ran it on another machine
where it’s installed: PHP as a service.  And since I couldn’t run all of
Savane, I just hacked together ‘markup.php’ (attached).  High-quality
work.

And then I reused the atom-parsing code that used to be in our (www)
module to get the damn entries and emit them as SXML for Haunt (file
attached).  Phewww.

I think we should do the same for Guile real soon.

Anyway, big thanks to Assaf!

Ludo’.

¹ https://lists.gnu.org/archive/html/savannah-hackers-public/2016-10/msg00017.html


[-- Attachment #2: markup.php --]
[-- Type: application/octet-stream, Size: 25262 bytes --]

<?php
# <one line to give a brief idea of what this does.>
# 
#  Copyright 2005-2006 (c) Tobias Toedter <t.toedter--gmx.net>
#                          Mathieu Roy <yeupou--gnu.org>
# 
# This file is part of Savane.
# 
# Savane is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
# 
# Savane is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
# 
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

## Provides functions to allow users to format the text in a secure way:
##    markup_basic() for very light formatting
##    markup_rich() for formatting excepting headers
##    markup_full() for full formatting, including headers

require_once('utils.php');


## Will tell the user what is the level of markup available in a uniformized
# way. 
# Takes as argument the level, being full / rich / basic / none
# To avoid making page looking strange, we will put that only on textarea
# where it is supposed to be the most useful
function markup_info($level, $additionnal_string=false)
{
  if ($level == 'basic')
    {
      $string = _("Basic Markup");
      $text = _("Only basic text tags are available in this input field.");
    }
  elseif ($level == 'rich')
    {
      $string = _("Rich Markup");
      $text = _("Rich and basic text tags are available in this input field.");      
    }
  elseif ($level == 'full') 
    {
      $string = _("Full Markup");
      $text = _("Every tags are available in this input field.");      
    }
  elseif ($level == 'none')
    {
      $string = _("No Markup");
      $text = _("No tags are available in this input field.");    
    }

  if ($level != 'none')
    {
      $text .= " "._("Check the Markup Reminder in Related Recipes for a description of these tags.");
    }

  return '<span class="smaller">('.utils_help('<img src="'.$GLOBALS['sys_home'].'images/'.SV_THEME.'.theme/misc/edit.png" border="0" class="icon" alt="'.$string.'" />'.$string, 
		    $text,
		    true).$additionnal_string.')</span>';
}


##
# Converts special markup characters in the input text to real HTML
#
# The following syntax is supported:
# * *word* -> <strong>word</strong>
# * _word_ -> <em>word</em>
# * [http://gna.org/] -> <a href="http://gna.org/">http://gna.org/</a>
# * [http://gna.org/ text] -> <a href="http://gna.org/">text</a>
# * (bug|task|...) #1234 -> Link to corresponding page
#
function markup_basic($text)
{
  $lines = explode("\n", $text);
  $result = array();

  foreach ($lines as $line)
    {
      $result[] = _markup_inline($line);
    }

  return join("\n", $result);
}



##
# Converts special markup characters in the input text to real HTML
#
# This function does the same markup as utils_basic_markup(), plus
# it supports the following:
# * paragraphs
# * lists (<ul> and <ol>)
# * nested lists
# * horizontal rulers
#
function markup_rich($text)
{
  return markup_full($text, false);
}



##
# Converts special markup characters in the input text to real HTML
#
# This function does the same markup as utils_rich_markup(), plus
# it converts headings to <h3> ... <h6>
#
function markup_full($text, $allow_headings=true)
{
  $lines = explode("\n", $text);
  $result = array();

  # we use a stack (last in, first out) to track the current
  # context (paragraph, lists) so we can correctly close tags
  $context_stack = array();

  $quoted_text = false;
  $verbatim = false;
  # $printer = array(1, 2);
  foreach ($lines as $index => $line)
    {

      # the verbatim tags are not allowed to be nested, because
      # they are translated to HTML <textarea> (<pre> in printer mode),
      # which in turn is also
      # not allowed to be nested.
      # therefore, we dont need a counter of the level, but
      # a simple bool flag
      # We also need to bufferize the verbatim content, as we want to now
      # its exact number of lines
      #
      # yeupou, 2006-10-31: we need a verbatim count, because actually 
      # we may want to put at least one verbatim block into another, for
      # instance in the recipe that explain the verbatim tag
      if (preg_match('/([+]verbatim[+])/', $line) and !$verbatim)
        {
          $verbatim = 1;  
	  $verbatim_buffer = '';
	  $verbatim_buffer_linecount = 0;

	  $line = join("\n", $context_stack);

	  if (empty($printer))
	    { array_unshift($context_stack, '</textarea>'); }
	  else
	    { array_unshift($context_stack, '</pre>'); }
	  
	  # Jump to the next line, assuming that we can ignore the rest of the
	  # line
	  continue;
        }

      # Increment the verbatim count if we find a verbatim closing in a 
      # verbatim environment
      if (preg_match('/([+]verbatim[+])/', $line) and $verbatim)
        { $verbatim++; }

      if (preg_match('/([-]verbatim[-])/', $line) and $verbatim == 1)
        {
          $verbatim = false;

          $line = join("\n", $context_stack);
          array_shift($context_stack);

          #array_pop($result); # no longer useful since we bufferize verbatim
	  if (empty($printer))
	    {
	      # Limit the textarea to 20 lines
	      if ($verbatim_buffer_linecount > 20)
		{ $verbatim_buffer_linecount = 20; }

	      # Use a text input if it is not multiline
	      if ($verbatim_buffer_linecount < 2)
		{
		  $result[] = '<input type="text" class="verbatim" readonly="readonly" size="60" value="'.$verbatim_buffer.'" />';
		}
	      else
		{		  
		  $result[] = '<textarea class="verbatim" readonly="readonly" rows="'.$verbatim_buffer_linecount.'" cols="80">'.$verbatim_buffer.'</textarea>';
		}
	    }
	  else
	    {
	      $result[] = '<pre class="verbatim">'.$verbatim_buffer.'</pre>';
	    }
	  $verbatim_buffer = '';
	  $verbatim_buffer_linecount = 0;
	  
	  # Jump to the next line, assuming that we can ignore the rest of the
	  # line
	  continue;
        }

      # Decrement the verbatim count if we find a verbatim closing in a 
      # verbatim environment
      if (preg_match('/([-]verbatim[-])/', $line) and $verbatim > 1)
	  { $verbatim--; }

      # if we're in the verbatim markup, don't apply the markup
      if ($verbatim)
        {
          # disable the +nomarkup+ tags by inserting a unique string.
          # this has to be done in the original string, because that
          # is the one which will be split upon the +nomarkup+ tags,
          # see below
          $escaped_line = str_replace('nomarkup',
            'no-1a4f67a7-4eae-4aa1-a2ef-eecd8af6a997-markup', $line);
          $lines[$index] = $escaped_line;
          $verbatim_buffer .= $escaped_line . "\r";
	  $verbatim_buffer_linecount++;
        }
      else
        {
	  # Otherwise, normal run, do the markup
          $result[] = _full_markup($line, $allow_headings, $context_stack, $quoted_text);	  
	}

    }

  # make sure that all previously used contexts get their
  # proper closing tag by merging in the last closing tags
  $markup_text = join("\n", array_merge($result, $context_stack));

  # its easiest to markup everything, without supporting the nomarkup
  # tag. afterwards, we replace every nomarkup tag pair with the content
  # between those tags in the original string
  $original = preg_split('/([+-]nomarkup[+-])/', join("\n", $lines), -1,
    PREG_SPLIT_DELIM_CAPTURE);
  $markup = preg_split('/([+-]nomarkup[+-])/', $markup_text, -1,
    PREG_SPLIT_DELIM_CAPTURE);
  # save the HTML tags from the last element in the markup array, see below
  $last_tags = $markup[count($markup)-1];
  $nomarkup_level = 0;

  foreach ($original as $index => $original_text)
    {
      # keep track of nomarkup tags
      if ($original_text == '+nomarkup+') $nomarkup_level++;
      if ($original_text == '-nomarkup-') $nomarkup_level--;

      # if the current match is the nomarkup tag, we don't want it to
      # show up in the markup text -> set it to an empty string
      if (preg_match('/([+-]nomarkup[+-])/', $original_text))
        {
          $markup[$index] = '';
          $original_text = '';
        }
      # while we're in a nomarkup environment, the already marked up text
      # needs to be replaced with the original content. Also, we need
      # to add <br />  tags for newlines.
      if ($nomarkup_level > 0)
        {
          $markup[$index] = nl2br($original_text);
        }
    }

  # normally, $nomarkup_level must be zero at this point. however, if
  # the user submits wrong markup and forgets to close the -nomarkup-
  # tag, we need to take care of that.
  # To do this, we need to look for closing tags which have been deleted.
  if ($nomarkup_level > 0)
    {
      $trailing_markup = array_reverse(split("\n", $last_tags));
      $restored_tags = '';
      foreach ($trailing_markup as $tag)
        {
          if (preg_match('/^\s*<\/[a-z]+>$/', $tag))
            {
              $restored_tags = "\n$tag$restored_tags";
            }
          else
            {
              $markup[] = $restored_tags;
              break;
            }
        }
    }

  # lastly, revert the escaping of +nomarkup+ tags done above
  # for verbatim environments
  return str_replace('no-1a4f67a7-4eae-4aa1-a2ef-eecd8af6a997-markup',
    'nomarkup', join('', $markup));
}


# Convert whatever content that can contain markup to a valid text output
# It wont touch what seems to be valid in text already, or what cannot
# be converted in a very satisfactory way.
# This function should be minimal, just to avoid weird things, not to do
# very fancy things.
function markup_textoutput ($text)
{
  $lines = explode("\n", $text);
  $result = array();

  $protocols = "https?|ftp|sftp|file|afs|nfs";
  $savane_tags = "verbatim|nomarkup";

  foreach ($lines as $line)
    {
      # Handle named hyperlink.
      $line = 
	preg_replace(
              # find the opening brace '['
		     '/\['
              # followed by the protocol, either http:// or https://
		     .'(('.$protocols.'):\/\/'
              # match any character except whitespace or the closing
              # brace ']' for the actual link
		     .'[^\s\]]+)'
              # followed by at least one whitespace
		     .'\s+'
              # followed by any character (non-greedy) and the
              # next closing brace ']'
		     .'(.+?)\]/', '$3 <$1>', $line);
      
      # Remove savane-specific tags
      $line = preg_replace('/\+('.$savane_tags.')\+/', '', $line);
      $line = preg_replace('/\-('.$savane_tags.')\-/', '', $line);
      $result[] = $line;
      
    }

  return join("\n", $result);
}


##
# Internal function for recognizing and formatting special markup
# characters in the input line to real HTML
#
# This function is a helper for utils_full_markup() and should
# not be used otherwise.
#
function _full_markup($line, $allow_headings, &$context_stack, &$quoted_text)
{
  #############################################################
  # context formatting
  #
  # the code below marks up recognized special characters,
  # by starting a new context (e.g. headings and lists)
  #############################################################

  # generally, we want to start a new paragraph. this will be set
  # to false, if a new paragraph is no longer appropriate, like
  # for headings or lists
  $start_paragraph = true;

  # Match the headings, e.g. === heading ===
  if ($allow_headings)
    {
      $line = _markup_headings($line, $context_stack, $start_paragraph);
    }

  # Match list items
  $line = _markup_lists($line, $context_stack, $start_paragraph);

  # replace four '-' sign with a horizontal ruler
  if (preg_match('/^----\s*$/', $line))
    {
      $line = join("\n", $context_stack).'<hr />';
      $context_stack = array();
      $start_paragraph = false;
    }

  #############################################################
  # inline formatting
  #
  # the code below marks up recognized special characters,
  # without starting a new context (e.g. <strong> and <em>)
  #############################################################

  $line = _markup_inline($line);

  #############################################################
  # paragraph formatting
  #
  # the code below is responsible for doing the Right Thing(tm)
  # by either starting a new paragraph and closing any previous
  # context or continuing an existing paragraph
  #############################################################

  # change the quoteing mode when the line start with '>'
  if (substr($line, 0, 4) == '&gt;')
    {
      # if the previous line was not quoted, start a new quote paragraph
      if (!$quoted_text)
        {
          $line = join("\n", $context_stack)."<p class=\"quote\">$line";
          # empty the stack
          $context_stack = array('</p>');
          $start_paragraph = false;
        }
      $quoted_text = true;
    }
  else
    {
      # if the previous line was quoted, end the quote paragraph
      if ($quoted_text and $start_paragraph and $line != '')
        {
          $line = join("\n", $context_stack)."\n<p>$line";
          # empty the stack
          $context_stack = array('</p>');
        }
      $quoted_text = false;
    }

  # don't start a new paragraph again, if we already did that
  if (isset($context_stack[0]) && $context_stack[0] == '</p>')
    {
      $start_paragraph = false;
    }

  # add proper closing tags when we encounter an empty line.
  # note that there might be no closing tags, in this case
  # the line will remain emtpy.
  if (preg_match('/^(|\s*)$/', $line))
    {
      $line = join("\n", $context_stack)."$line";
      # empty the stack
      $context_stack = array();
      $start_paragraph = false;
    }

  # Finally start a new paragraph if appropriate
  if ($start_paragraph)
    {
      # make sure that all previously used contexts get their
      # proper closing tag
      $line = join("\n", $context_stack)."<p>$line";
      # empty the stack
      $context_stack = array('</p>');
    }

  # append a linebreak while in paragraph mode
  if (isset($context_stack[0]) && $context_stack[0] == '</p>')
    {
      $line .= '<br />';
    }

  return $line;
}



##
# Internal function for recognizing and formatting headings
#
# This function is a helper for _full_markup() and should
# not be used otherwise.
#
function _markup_headings($line, &$context_stack, &$start_paragraph)
{
  if (preg_match(
    # find one to four '=' signs at the start of a line
    '/^(={1,4})'
    # followed by exactly one space
    .' '
    # followed by any character
    .'(.+)'
    # followed by exactly one space
    .' '
    # followed by one to four '=' signs at the end of a line (whitespace allowed)
    .'(={1,4})\s*$/', $line, $matches))
    {
      $header_level_start = max(min(strlen($matches[1]), 4), 1);
      $header_level_end = strlen($matches[3]);
      if ($header_level_start == $header_level_end)
        {
          # if the user types '= heading =' (one '=' sign), it will
          # actually be rendered as a level 3 heading <h3>
          $header_level_start += 2;
          $header_level_end += 2;

          $line = "<h$header_level_start>$matches[2]</h$header_level_end>";
          # make sure that all previously used contexts get their
          # proper closing tag
          $line = join("\n", $context_stack).$line;
          # empty the stack
          $context_stack = array();
          $start_paragraph = false;
        }
    }
  return $line;
}



##
# Internal function for recognizing and formatting lists
#
# This function is a helper for _full_markup() and should
# not be used otherwise.
#
function _markup_lists($line, &$context_stack, &$start_paragraph)
{
  if (preg_match('/^\s?([*0]+) (.+)$/', $line, $matches))
    {
      # determine the list level currently in use
      $current_list_level = 0;
      foreach ($context_stack as $context)
        {
          if ($context == '</ul>' or $context == '</ol>')
            {
              $current_list_level++;
            }
        }

      # determine whether the user list levels match the list
      # level we have in our context stack
      #
      # this will catch (potential) errors of the following form:
      # * list start
      # 0 maybe wrong list character
      # * list end
      $markup_position = 0;
      foreach (array_reverse($context_stack) as $context)
        {
          # we only care for the list types
          if ($context != '</ul>' and $context != '</ol>')
            {
              continue;
            }

          $markup_character = substr($matches[1], $markup_position, 1);

          if (($markup_character === '*' and $context != '</ul>')
            or ($markup_character === '0' and $context != '</ol>'))
            {
              # force a new and clean list start
              $current_list_level = 0;
              break;
            }
          else
            {
              $markup_position++;
            }
        }

      # if we are not in a list, close the previous context
      $line = '';
      if ($current_list_level == 0)
        {
          $line = join("\n", $context_stack);
          $context_stack = array();
        }

      # determine the list level the user wanted
      $wanted_list_level = strlen($matches[1]);

      # here we start a new list and make sure that the markup
      # is valid, even if the user did skip one or more list levels
      $list_level_counter = $current_list_level;
      while ($list_level_counter < $wanted_list_level)
        {
          switch (substr($matches[1], $list_level_counter, 1))
            {
              case '*':
                $tag = 'ul';
                break;
              case '0':
                $tag = 'ol';
                break;
            }
          $line .= "<$tag>\n<li>";
          array_unshift($context_stack, "</$tag>");
          array_unshift($context_stack, "</li>");
          $list_level_counter++;
        }

      # here we end a previous list and make sure that the markup
      # is valid, even if the user did skip one or more list levels
      $list_level_counter = $current_list_level;
      while ($list_level_counter > $wanted_list_level)
        {
          $line .= array_shift($context_stack)."\n"
            .array_shift($context_stack)."\n";
          $list_level_counter--;
        }

      # prepare the next item of the same list level
      if ($current_list_level >= $wanted_list_level)
        {
          $line .= "</li>\n<li>";
        }

      # finally, append the list item
      $line .= $matches[2];
      $start_paragraph = false;
    }
  return $line;
}



##
# Internal function for recognizing and formatting inline tags and links
#
# This function is a helper for _full_markup() and should
# not be used otherwise.
#
function _markup_inline($line)
{
  # Group_id may be necessary for recipe #nnn links
  global $group_id;

  $comingfrom = '';
  if ($group_id)
    {
      $comingfrom = "&amp;comingfrom=$group_id";
    }

  if (strlen($line) == 0)
    {
      return;
    }

  # Regexp of protocols supported in hyperlinks (should be protocols that
  # we can expect web browsers to support)
  $protocols = "https?|ftp|sftp|file|afs|nfs";

  # Prepare usual links: prefix "www." with "http://"
  # if it is preceded by [ or whitespace or at the beginning of line.
  # (don't want to prefix in cases like "//www.." or "ngwww...")
  $line = preg_replace('/(^|\s|\[)(www\.)/i', '$1http://$2', $line); 

  # replace the @ sign with an HTML entity, if it is used within
  # an url (e.g. for pointers to mailing lists). This way, the
  # @ sign doesn't get mangled in the e-mail markup code
  # below. See bug #2689 on http://gna.org/ for reference.
  $line = eregi_replace("([a-z]+://[^<>[:space:]]+)@", "\\1&#64;", $line);

  # Prepare the markup for normal links, e.g. http://test.org, by
  # surrounding them with braces []
  # (& = begin of html entities, it means a end of string unless
  # it is &amp; which itself is the entity for &)
  $line = preg_replace('/(^|\s|[^\[])(('.$protocols.'):\/\/(&amp;|[^\s&]+[a-z0-9\/^])+)/i',
    '$1[$2]', $line);

  # do a markup for mail links, e.g. info@support.org
  # (do not use utils_emails, this does extensive database
  # search on the string
  # and replace addresses in several fashion. Here we just want to make
  # a link). Make sure that 'cvs -d:pserver:anonymous@cvs.sv.gnu.org:/...'
  # is NOT replaced.
  $line = preg_replace("/(^|\s)([a-z0-9_+-.]+@([a-z0-9_+-]+\.)+[a-z]+)(\s|$)/i",
		       '\1' . utils_email_basic('\2') . '\4', $line);

  # Links between items
  # FIXME: it should be i18n, but in a clever way, meaning that everytime
  # a form is submitted with such string, the string get converted in
  # english so we always get the links found without having a regexp
  # including every possible language.
  $trackers = array (
      "bugs?" => "bugs/?",
      "support|sr" => "support/?",
      "tasks?" => "task/?",
      "recipes?|rcp" => "cookbook/?func=detailitem$comingfrom&amp;item_id=",
      "patch" => "patch/?",
      # In this case, we make the link pointing to support, it wont matter,
      # the download page is in every tracker and does not check if the tracker
      # is actually used
      "files?" => "support/download.php?file_id=",
  );
  foreach ($trackers as $regexp => $link)
    {
      # Allows only two white space between the string and the numeric id
      # to avoid having too time consuming regexp. People just have to pay
      # attention.
      $line = preg_replace("/(^|\s|\W)($regexp)\s{0,2}#([0-9]+)/i",
        '$1<em><a href="'.$GLOBALS['sys_home']
        .$link.'$3">$2&nbsp;#$3</a></em>', $line);
    }

  # add an internal link for comments
  $line = preg_replace('/(comments?)\s{0,2}#([0-9]+)/i',
    '<em><a href="#comment$2">$1&nbsp;#$2</a></em>', $line);

  # Add support for named hyperlinks, e.g.
  # [http://gna.org/ Text] -> <a href="http://gna.org/">Text</a>
  $line = preg_replace(
    # find the opening brace '['
    '/\['
    # followed by the protocol, either http:// or https://
    .'(('.$protocols.'):\/\/'
    # match any character except whitespace or the closing
    # brace ']' for the actual link
    .'[^\s\]]+)'
    # followed by at least one whitespace
    .'\s+'
    # followed by any character (non-greedy) and the
    # next closing brace ']'
    .'(.+?)\]/', '<a href="$1">$3</a>', $line);
 
  # Add support for unnamed hyperlinks, e.g.
  # [http://gna.org/] -> <a href="http://gna.org/">http://gna.org/</a> 
  # We make sure the lenght of the string is not too long, otherwise we cut
  # it.
  # (Supposedly, preg_replace_callback is faster than preg_replace //e but
  # it seems less reliable)
  $line = preg_replace(
    # find the opening brace '['
    '/\['
    # followed by the protocol, either http:// or https://
    # (FIXME: which protocol does it makes sense to support, which one
    # should we ignore?)
    .'(('.$protocols.'):\/\/'
    # match any character except whitespace (non-greedy) for
    # the actual link, followed by the closing brace ']'
    .'[^\s]+?)\]/e', "utils_cutlink('$1')", $line);

  # *word* -> <strong>word</strong>
  $line = preg_replace(
    # find an asterisk
    '/\*'
    # then one character (except a space or asterisk)
    .'([^* ]'
    # then (optionally) any character except asterisk
    .'[^*]*?)'
    # then an asterisk
    .'\*/', '<strong>$1</strong>', $line);

  # _word_ -> <em>word</em>
  $line = preg_replace(
    # allow for the pattern to start at the beginning of a line.
    # if it doesn't start there, the character before the slash
    # must be either whitespace or the closing brace '>', to
    # allow for nested html tags (e.g. <p>_markup_</p>).
    # Additionally, the opening brace may appear.
    # See bug #10571 on http://gna.org/ for reference.
    '/(^|\s+|>|\()'
    # match the underscore
    .'_'
    # match any character (non-greedy)
    .'(.+?)'
    # match the ending underscore and either end of line or
    # a non-word character
    .'_(\W|$)/', '$1<em>$2</em>$3', $line);

  return $line;
}

print '<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>'.$id.'</id>
  <link rel="self" href="'.$myself.'"/>
  <title>'.$title.'</title>
  <updated>'.$last_updated.'</updated>

';

$stdin = fopen("php://stdin","r"); 
while ($row = fgetcsv($stdin, 10000000, "\t"))
{
  $id = "http://$sys_default_domain{$sys_home}forum/forum.php?forum_id={$row[0]}";
  $title = $row[4];
  $updated = date('c', $row[2]);
  $author = $row[1];
  $content = markup_full(trim(str_replace(array("\\n","\\t"), array("\n","\t"), $row[5])));

  print "
  <entry>
    <id>$id</id>
    <link rel='alternate' href='$id'/>
    <title>$title</title>
    <updated>$updated</updated>
    <author>
      <name>$author</name>
    </author>
    <content type='xhtml' xml:base='$id'>
      <div xmlns='http://www.w3.org/1999/xhtml'>$content</div>
    </content>
  </entry>
";
}

// Feed footer
print "</feed>";

[-- Attachment #3: oldnews.scm --]
[-- Type: application/octet-stream, Size: 4620 bytes --]

(define-module (oldnews)
  #:use-module (ice-9 match)
  #:use-module (ice-9 pretty-print)
  #:use-module (srfi srfi-1)
  #:use-module (srfi srfi-9)
  #:use-module (srfi srfi-11)
  #:use-module (srfi srfi-19)
  #:use-module (srfi srfi-26)
  #:use-module (web client)
  #:use-module (sxml match)
  #:use-module (sxml simple))

(define %atom-url
  ;; The web site's news feed.
  "http://savannah.gnu.org/news/atom.php?group=guix")

(define (atom->sxml port)
  (xml->sxml port
             #:namespaces '((atom . "http://www.w3.org/2005/Atom")
                            (xhtml . "http://www.w3.org/1999/xhtml"))
             #:trim-whitespace? #t))

(define (fetch-news)
  "Return the SXML tree of the Atom news feed."
  (call-with-values
      (lambda ()
        (http-get %atom-url))
    (lambda (response contents)
      (call-with-input-string contents atom->sxml))))

(define-record-type <news-entry>
  (news-entry url title date author content)
  news-entry?
  (url      news-entry-url)                       ;string
  (title    news-entry-title)                     ;string
  (date     news-entry-date)                      ;SRFI-19 date
  (author   news-entry-author)                    ;sxml
  (content  news-entry-content))                  ;sxml

(define* (news-items #:optional (atom (fetch-news)))
  "Return the list of <news-entry> taken from the web site's RSS feed."
  (sxml-match atom
    ((*TOP* (*PI* ,pi ...)
            (atom:feed
             (atom:id ,feed-id)
             (atom:link)
             (atom:title ,feed-title)
             (atom:updated ,feed-updated)
             (atom:entry
              (atom:id ,id)
              (atom:link (@ (href ,link)))
              (atom:title ,title)
              (atom:updated ,updated)
              (atom:author (atom:name ,author))
              (atom:content ,content)
              ,rest ...)
             ...
             ))
     (pk 'feed feed-id feed-title feed-updated)
     (map news-entry
          link title
          (map (cut string->date <> "~Y-~m-~d") updated)
          author content))))

(define (strip-xhtml-namespace tree)
  (define (xhtml-prefixed-symbol? obj)
    (and (symbol? obj)
         (string-prefix? "xhtml:" (symbol->string obj))))

  (match tree
    (((or 'xhtml:input 'xhtml:textarea) ('@ attributes ...) body ...)
     ;; XXX: Savannah's news thing used 'textarea' and 'input' tags for
     ;; verbatim text; convert it.  Sometimes the content is in the 'value'
     ;; attribute rather than in the body.
     (match (assoc 'value attributes)
       (('value value)
        `(div (@ (class "example")) (pre ,value)))
       (_
        `(div (@ (class "example")) (pre ,@body)))))

    (((? xhtml-prefixed-symbol? tag) rest ...)
     (let ((tag (string->symbol (string-drop (symbol->string tag)
                                             (string-length "xhtml:")))))
       `(,tag ,@(map strip-xhtml-namespace rest))))
    (x
     x)))

(define (news-entry->post-file-name entry)
  (define whitespace?
    (cut char-set-contains? char-set:whitespace <>))

  (define (map-character chr result)
    (match chr
      ((? whitespace?)
       (cons #\- result))
      ((or #\& #\! #\: #\,)
       result)
      ((or #\* #\?)
       (cons #\- result))
      (_
       (cons chr result))))

  (string-downcase
   (list->string (string-fold-right map-character
                                    '()
                                    (news-entry-title entry)))))

(define (news-entry->post-sexp entry)
  (let ((date (news-entry-date entry)))
    (define sexp
      `((title . ,(news-entry-title entry))
        (author . ,(news-entry-author entry))
        (date . ,(list 'unquote
                       `(make-date ,(date-nanosecond date)
                                   ,(date-second date)
                                   ,(date-minute date)
                                   ,(date-hour date)
                                   ,(date-day date)
                                   ,(date-month date)
                                   ,(date-year date)
                                   ,(date-zone-offset date))))
        (content . ,(strip-xhtml-namespace (news-entry-content entry)))))

    `(begin
       (use-modules (srfi srfi-19))
       ,(list 'quasiquote sexp))))

(define (write-news-entry-as-post entry)
  (let ((file (news-entry->post-file-name entry)))
    (call-with-output-file (string-append file ".sxml")
      (lambda (port)
        (pretty-print (news-entry->post-sexp entry)
                      port)))))


;; Local Variables:
;; eval: (put 'sxml-match 'scheme-indent-function 1)
;; End:

  parent reply	other threads:[~2016-10-27 23:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-26  8:09 Web site news to move to Haunt Ludovic Courtès
2016-10-26 17:54 ` Christopher Allan Webber
2016-10-27 23:33 ` Ludovic Courtès [this message]
2016-10-28  7:28   ` ng0
2016-10-28 11:46   ` Andy Wingo
2016-11-04  3:29   ` Eric Bavier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87funh9xkd.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.