[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
import_feeds.rb-0.3 - nonexistent usefulinc.com/rss/manifest/ + xhtml
From: |
boud |
Subject: |
import_feeds.rb-0.3 - nonexistent usefulinc.com/rss/manifest/ + xhtml validation |
Date: |
Tue, 6 Feb 2007 23:32:30 +0100 (CET) |
hi samizdat-devel,
Here are some minor updates to the RDF import patch, i.e. here i'm
giving version 0.3.
There's no change to index.rb relative to the previous version, only
import_feeds.rb is changed. This is a patch relative to the 070120 snapshot.
(1) Some feeds such as:
http://argentina.indymedia.org/syn/features_long.rdf
http://jakarta.indymedia.org/newsfeed.php?type=feature&language=id
refer in the rdf header to an xml namespace defined in a URL which
responds to requests
xmlns:mn="http://usefulinc.com/rss/manifest/"
with
File not found
Change this error message for pages not found in public/404.html
This results in the <rdf:Description> box at the bottom of the file
crashing a parse error, presumably (i'm guessing) because it contains a
<mn:channels>
sub-box and the rss parser is unable to handle undefined tags. In any
case, removing the whole <rdf:Description> box avoids the error.
The standard ruby rss library (as far as i understand it) does not have
an obvious way of handling this, so i gave up and wrote a hardwired
hack:
+ # Remove tag section not needed and known to be buggy for
+ # invalid "mn" type URI http://usefulinc.com/rss/manifest/
+ if response =~ %r{http://usefulinc.com/rss/manifest/}
+
response.sub!(/<rdf:Description(.*\n)*?.*mn:channels.*(.*\n)*?.*<\/rdf:Description>/,"")
+ end
It works for at least the above two sites - argentina is running
sf-active (i think) and jakarta probably an old version of ocailt:
http://argentina.indymedia.org/syn/features_long.rdf
http://jakarta.indymedia.org/newsfeed.php?type=feature&language=id
(2) The second correction is replacing <it> by <em> and removing <br />
after </li> in order for the w3 xhtml validator not to complain.
cheers
boud
--- /tmp/tmp_snapshot/samizdat/cgi-bin/index.rb 2007-01-08 03:09:52.000000000
+0100
+++ /usr/share/samizdat/cgi-bin/index.rb 2007-02-01 00:13:29.000000000
+0100
@@ -12,6 +12,9 @@
require 'samizdat/engine'
+require 'import_feeds.rb' # TODO - should this be load or require?
+#require 'message_graph' # TODO: file hierarchy probably wrong
+
# messages that are related to any focus (and are not comments or old
# versions), ordered chronologically by date of relation to a focus (so that
# when message is edited, it doesn't flow up)
@@ -161,6 +164,13 @@
features = features.join + %{<div class="foot">} + t.nav_rss(rss_features)
+
t.nav(features.size < config['limit']['features'],
skip_feature + 1, 'index.rb?', 'skip_feature') + "</div>\n"
+
+ # This is to include a graph using message_graph.rb
+#31.01.07 - off
+# if( config['graph'] )
+# node_pairs = collect_features_graph(0, false, limit_page)
+# features += message_graph_method(node_pairs)
+# end
end
if render_updates
@@ -172,6 +182,14 @@
t.nav_rss(rss_updates) + t.nav(updates.size, skip + 1))
end
+ imported_feeds = "" # default is zero-length string
+ if( config['import_feeds'] )
+ imported_feeds = %{<tr><td class="links-head">}+ _('RDF Feeds')+
+ '</td></tr>
+ <tr><td class="links">' + import_feeds_method + '</td></tr>'
+ end
+
+
page =
if full_front_page
%{<table>
@@ -180,10 +198,10 @@
</thead>
<tr>
<td class="focuses">#{focuses}</td>
- <td class="features" rowspan="3">#{features}</td>
- <td class="updates" rowspan="3">#{updates}</td>
- </tr>
- <tr><td class="links-head">}+_('Links')+'</td></tr>
+ <td class="features" rowspan="6">#{features}</td>
+ <td class="updates" rowspan="6">#{updates}</td>
+ </tr>} + imported_feeds +
+ %{<tr><td class="links-head">}+_('Links')+'</td></tr>
<tr><td class="links">
<div class="focus"><a href="query.rb?run&query='+CGI.escape('SELECT ?resource WHERE
(dc::date ?resource ?date) (s::inReplyTo ?resource ?parent) LITERAL ?parent IS NOT NULL ORDER BY ?date
DESC')+'">'+_('All Replies')+'</a></div>
<div class="focus"><a href="foci.rb">'+_('All Focuses
(verbose)')+'</a></div>
--- /dev/null 2005-09-15 04:53:34.000000000 +0200
+++ /usr/share/samizdat/cgi-bin/import_feeds.rb 2007-02-06 23:00:34.971304448
+0100
@@ -0,0 +1,176 @@
+#!/usr/bin/env ruby
+#
+# Samizdat logout
+#
+# Copyright (c) 2002-2006 Dmitry Borodaenko <address@hidden>,
+# Boud (Indymedia) <address@hidden>
+#
+# This program is free software.
+# You can distribute/modify this program under the terms of
+# the GNU General Public License version 2 or later.
+#
+# vim: et sw=2 sts=2 ts=8 tw=0
+
+# VERSION import_feeds 0.3
+
+require 'samizdat/engine'
+
+require 'open-uri'
+require 'rss/1.0'
+require 'rss/dublincore'
+require 'rss/2.0'
+
+# TODO: The format_date method is from template.rb. In principle,
+# imported feeds should (could) be treated as resources - somewhat
+# similar to messages, but with some properties distinct from ordinary
+# messages. In that case, there would be no need to have redundancy
+# for the format_date method.
+def format_date(date)
+ date = date.to_time if date.methods.include? 'to_time' # duck
+ date = date.strftime '%Y-%m-%d %H:%M' if date.kind_of? Time
+ date
+end
+
+
+def import_feeds_method()
+
+ import_feeds_body = "<ul>"
+
+ interval = config['timeout']['import_feeds'] # time interval for importing
+ interval = 3600 if (interval == nil) # failsafe default
+ timenow = Time.now # object of Time class
+
+ # The expected caching time is the last "round number" time interval,
+ # based on total time in seconds defined in the Time class.
+ expected_caching_time = timenow.to_i.divmod(interval)[0] * interval
+ import_feeds_cache_key = 'imported_feeds/' + expected_caching_time.to_s
+
+ import_feeds_list_array = cache[import_feeds_cache_key]
+
+ if(import_feeds_list_array == nil)
+
+ import_feeds_list = Hash.new
+
+ config['import_feeds'].each do | feed_key, feed_value |
+ rss_source = feed_key
+
+ # At some point in the future, people might want to have e.g. https
+ # feeds, but there is no need to force people to write http:// when
+ # this is a very widely used default value. So protocol is optional
+ # here.
+
+ protocol = feed_value['protocol']
+ protocol = "http://" if( protocol == nil)
+
+ host = feed_value['host']
+ host = _(' Hostname missing.') if (host == nil)
+ filename = feed_value['filename']
+ filename = _(' Filename missing.') if (filename == nil)
+ anURI = protocol + host + filename
+ # anURI = protocol + feed_value['host'] + feed_value['filename']
+
+ # TODO: security - check before untainting?
+ # TODO: store and prepare rdf feeds in all available languages
+ # and give the user the one s/he wants?
+ response= ""
+ valid_URI=0
+ begin
+ open(anURI.untaint,
+ "Accept-Language" => config['locale']['languages'][0]) do |file|
+ response += file.read
+ valid_URI=1
+ end
+ rescue SocketError
+ valid_URI=0
+ import_feeds_body += _('<li><em>Error opening ') + %{<a href="} +
+ anURI + %{">} + _('this feed') + "</a></em></li>\n"
+ rescue URI::InvalidURIError
+ valid_URI=0
+ import_feeds_body += _('<li><em>Error opening ') + %{<a href="} +
+ anURI + %{">} + _('this feed') + "</a></em></li>\n"
+ rescue
+ valid_URI=0
+ import_feeds_body += _('<li><em>Error opening ') + %{<a href="} +
+ anURI + %{">} + _('this feed') + "</a></em></li>\n"
+ end
+
+ if(valid_URI==1)
+
+ # Remove tag section not needed and known to be buggy for
+ # invalid "mn" type URI http://usefulinc.com/rss/manifest/
+ if response =~ %r{http://usefulinc.com/rss/manifest/}
+
response.sub!(/<rdf:Description(.*\n)*?.*mn:channels.*(.*\n)*?.*<\/rdf:Description>/,"")
+ end
+
+ # The parsing of the feed initially allows non-RSS-1.0 compliant
+ # feeds, but the do_validate method is used on individual items
+ # later on to check their validity.
+ begin
+ rss = RSS::Parser.parse(response) # for RSS 1.0 compliant feeds
+ rescue RSS::InvalidRSSError
+ rss = RSS::Parser.parse(response, false) # allow non RSS 1.0 compliant
+ end
+
+ if(rss)
+ # rss.channel in RSS 2.0 seems to contain info in "rss" for RSS 1.0
+ # So rss_channel is used here as a commmon name for either.
+ rss_channel = rss
+ if rss.rss_version == "2.0"
+ rss_channel = rss.channel
+ end
+
+ # if there is a 'max_entries' parameter, then use at most that
+ # number of items for that feed
+ n_items=rss_channel.items.length
+ if(feed_value['max_entries'])
+ if(n_items > feed_value['max_entries'])
+ n_items = feed_value['max_entries']
+ end
+ end
+
+ for item_number in 0...n_items
+ if rss_channel.item(item_number).do_validate
+ rss_link = rss_channel.item(item_number).link.strip
+ title = rss_channel.item(item_number).title.strip
+ date = format_date(rss_channel.item(item_number).date)
+
+ # add this feed to the list of valid feeds
+ import_feeds_list[rss_link] = { "rss_source" => rss_source,
+ "title" => title, "date" => date }
+
+ end
+ end # import_feeds_list.each { | feed_key, feed_value |
+ end # if(rss)
+ end # if(valid_URI==1)
+ end # for feed_number in ...
+
+
+
+
+ # Sort the import feeds list by date. The result is an array of
+ # pairs. The first element of each pair is the link (in principle,
+ # this should be unique). The second element of each pair is
+ # a hash, containing the other useful pieces of feed
+ # information (such as source, title, date)
+ import_feeds_list_array = import_feeds_list.sort {
+ |a,b| b[1]['date'] <=> a[1]['date'] }
+
+ # update the cache
+ cache[import_feeds_cache_key] = import_feeds_list_array
+
+ end # if(import_feeds_list_array == nil)
+
+ import_feeds_list_array.each do | feed |
+ import_feeds_body +=
+ "<li> <em>" + feed[1]['rss_source'] +
+ '</em> <a href="' + feed[0] + '">' +
+ feed[1]['title'] + "</a> " +
+ feed[1]['date'] + "</li>\n"
+ end
+
+ import_feeds_body += "</ul>"
+
+ import_feeds_body
+
+end # def import_feeds_method
+
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- import_feeds.rb-0.3 - nonexistent usefulinc.com/rss/manifest/ + xhtml validation,
boud <=