samizdat-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

antispam-0.1 patch


From: boud
Subject: antispam-0.1 patch
Date: Mon, 11 Jun 2007 01:10:28 +0200 (CEST)

hi samizdat-devel,

Here's a basic antispam feature.

It involves 5 different files, though most of the work is done in antispam_helper in order to try to make minimal interference
with other parts of samizdat.

* engine/exceptions.rb
* config.yaml  (antispam: true,  delay: <seconds>)
* engine/controller.rb * controllers/message_controller.rb
* helpers/antispam_helper.rb

The inspiration is from the twiki BlackListPlugin:
http://twiki.org/cgi-bin/view/Plugins/BlackListPlugin

Since we don't want to track IP numbers, the main part of the twiki strategy which remains is checking the content of an
article (@content.body in a Message class object) against a
meta-list of wikispam URLs collected by the wiki community,
[but excluding a few of these which seem to be more about internet evolution (criticism of google and yahoo) than about
real spamming...].

Suspected spam is refused with a SpamError exception through
check_content whenever e.g. someone tries to publish a message or a reply. A random delay (default up to 120 seconds) is used to discourage rapid repeated tries (e.g. to find a hole
in the filter).

cheers
boud


----------------------------------------------------------------------

POSSIBLE TODOs:

TODO: If the spammer fails to turn off cookies, then suspected
spammers' cookies could be updated to reflect their status.

TODO: Elements to track repeated frequent attempts at spamming. Problem: distinguish this from an ordinary user's behaviour - difficult, since a lot of wikispamming is done by dedicated humans presumably paid to do it.

TODO: Another possible todo would be to put in a "rel=nofollow" tag to *all* a href links which have been modified within the last e.g. 24 hours, meaning that there has not been enough time for another user (or moderator) to check
the URL. The idea is that people do wikispam with the aim of
getting high google rank. Google in principle accepts the nofollow tag (AFAIK it introduced it...?). So if links to a particular website or family of websites or websites running
a particular software package systematically fail to contribute
to google rank even when they are wikispammed, then the spammers
will prioritise other, less well protected sites.

Counterarguments: Not all samizdat messages are open for editing;
this adds an extra complication to rendering messages including
a dependence on their last modification time, not just their
content; many search engines ignore nofollow tags; it weakens
the webbed nature of the web.

Note: mediawiki (or at least the wikipedia instance of it)
does something like this.


QUESTIONS:

Do we want to warn the suspected spammer or not? In this version (0.1), the user is warned, but after a delay.
BlackListPlugin warns the user in some cases, and in other
cases gives no indication what type of error occurred. Giving a generic user error message would give less info to spammers and avoid helping them "improve" their sabotage
of the internet, but it would also make it harder for
genuine users who accidentally get classified as spammers
to warn the sysadmin and samizdat developers.

MORE INFO:
http://moinmoin.wikiwikiweb.de/AntiSpamGlobalSolution
http://en.wikipedia.org/wiki/Spamdexing


----------------------------------------------------------------------


--- /tmp/tmp_snapshot/samizdat/lib/samizdat/controllers/message_controller.rb   
2007-04-24 22:07:48.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/controllers/message_controller.rb        
2007-06-11 00:09:48.558173112 +0200
@@ -9,6 +9,7 @@
 # vim: et sw=2 sts=2 ts=8 tw=0

 require 'samizdat/helpers/message_helper'
+require 'samizdat/helpers/antispam_helper'

 class MessageController < Controller
   include MessageHelper
@@ -321,6 +324,11 @@
       _('Message title is required')
     @upload or @message.content.body.kind_of? String or raise UserError,
       _('Message body is required')
+    if config['antispam']
+      @message.is_not_spam? or raise SpamError,
+      _('Your message looks like spam')
+    end
+
   end

   def edit_form(*options)


--- /tmp/tmp_snapshot/samizdat/lib/samizdat/engine/exceptions.rb        
2006-11-28 17:57:47.000000000 +0100
+++ /usr/lib/ruby/1.8/samizdat/engine/exceptions.rb     2007-06-09 
23:22:01.000000000 +0200
@@ -24,3 +24,6 @@

 # raised when account is blocked for email confirmation
 class AccountBlockedError < UserError; end
+
+# raised on suspicion of spamming
+class SpamError < UserError; end


--- /tmp/tmp_snapshot/samizdat/lib/samizdat/engine/controller.rb        
2007-05-05 15:15:00.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/engine/controller.rb     2007-06-10 
01:23:30.000000000 +0200
@@ -81,6 +81,14 @@
       @content_for_layout =
 '<p>'+_('Your account is blocked until the email address you have specified is 
confirmed. Confirmation message with instructions was sent to that address.')+'</p>'

+    when SpamError
+      sleep_time = config['antispam']['delay']  # sleep time in seconds
+      sleep_time = 120 if !sleep_time   # default in case value is nil
+      sleep(rand(sleep_time))
+ @title = _('User Error') + @content_for_layout =
+%{<p>#{error}.</p><p>}+_("Press 'Back' button of your browser to 
return.")+'</p>'
+
     when UserError
       @title = _('User Error')
       @content_for_layout =


--- /dev/null   2005-09-15 04:53:34.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/helpers/antispam_helper.rb       2007-06-11 
00:07:31.510007600 +0200
@@ -0,0 +1,80 @@
+# Samizdat HTML helpers
+#
+#   Copyright (c) 2002-2007  Dmitry Borodaenko <address@hidden>
+#   Copyright (c) 2007  boud indymedia <address@hidden>
+#
+#   This program is free software.
+#   You can distribute/modify this program under the terms of
+#   the GNU General Public License version 2 or later.
+#
+# vim: et sw=2 sts=2 ts=8 tw=0
+
+require 'samizdat/engine'
+require 'samizdat/models/message'
+require 'open-uri'
+
+class Message  # additional methods
+
+  def is_not_spam?
+    spam_regexp_list = cache['spam_regexp_list']
+
+    if spam_regexp_list == nil # try to update the spam list
+      anURI = config['antispam']['spam_regexp_list_URI']
+      # TODO: Is it OK to hardwire a default value here?
+ anURI = 'http://arch.thinkmo.de/cgi-bin/spam-merge' if anURI == nil + if anURI != nil
+        spam_regexp_list = "".to_s # initially treat this as a string
+        begin
+          Kernel.open(anURI.untaint) { |f| spam_regexp_list += f.read }
+        rescue
+          # TODO: should be some warning if the read fails
+          # 68-char "GTUBE" spamstring http://spamassassin.apache.org/gtube/
+ spam_regexp_list = + 'XJS\*C4JDBQADN1\.NSBN3\*2IDNEN\*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL\*C\.34X' + end
+      end
+      if spam_regexp_list
+        s = []
+        spam_regexp_list.split(/[\r\n]{1,}/).each do |line|
+          s_new, = line.split("#")  # ignore comments
+          s_new.chomp!(" ")
+          if  s_new != nil and s_new.length > 6 and
+              not ( s_new =~ /\(\?\</ ) # exclude this non-ruby regexp 
extension
+            exclude = false
+            # exclude some URI's from being classified as spam
+ if config['antispam']['exclude_list'] != nil + config['antispam']['exclude_list'].each do |string|
+                exclude = true if  s_new =~ Regexp.new(string)
+              end
+            end
+            s += [Regexp.new(s_new)]  if not exclude
+          end
+        end
+      end
+
+      spam_regexp_list = s
+      cache['spam_regexp_list'] = spam_regexp_list # update cache
+    end
+
+    white_list = nil # TODO: local white_list
+    if white_list and @content.body =~ white_list
+      true
+    else
+      black_list = nil # TODO: local white_list
+      # TODO: local black_list
+      if black_list and @content.body =~ black_list
+        false
+      else
+ if spam_regexp_list + @is_spam = false + spam_regexp_list.each do |re| + @is_spam = true if @content.body =~ re
+          end
+          not @is_spam # toggle
+        else
+ true # default: assume is_not_spam if it got here + end
+      end
+    end
+  end
+end


--- /dev/null   2005-09-15 04:53:34.000000000 +0200
+++ /etc/samizdat/sites/config.yaml     2007-06-11 00:22:53.025915888 +0200
@@ -0,0 +1,26 @@
+# Antispam mechanism - refuses to publish articles/replies which
+# look like spam, and delays informing the user by a delay which +# should be reasonable for a human publishing a non-spam article,
+# but unreasonably long for a spammer wishing to publish repeatedly.
+#
+# Comment this section out entirely to disable it.
+#
+# To enable it, you must turn on at least one entry, e.g. enabled: true
+
+antispam: + enabled: true
+#  maximum delay in seconds - value is random from 0 to this value
+ delay: 120 +#
+# URL of meta-list of regular expressions of wiki spam
+   spam_regexp_list_URI: http://arch.thinkmo.de/cgi-bin/spam-merge
+# Use a local copy instead if you don't wish to risk automatic updates,
+# e.g.
+#   spam_regexp_list_URI: http://localhost/spam-merge
+#
+# Exclude certain URIs from the big wiki-spam URI list. +# Be careful about escaping special characters.
+   exclude_list: [ google-watch\\.org, yahoo-watch\\.org,
+                   wikipedia-watch\\.org, ln-s\\.net,
+                   namebase\\.org ]
+


----------------------------------------------------------------------




reply via email to

[Prev in Thread] Current Thread [Next in Thread]