[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
antispam-0.1 patch
From: |
boud |
Subject: |
antispam-0.1 patch |
Date: |
Mon, 11 Jun 2007 01:10:28 +0200 (CEST) |
hi samizdat-devel,
Here's a basic antispam feature.
It involves 5 different files, though most of the work is done
in antispam_helper in order to try to make minimal interference
with other parts of samizdat.
* engine/exceptions.rb
* config.yaml (antispam: true, delay: <seconds>)
* engine/controller.rb
* controllers/message_controller.rb
* helpers/antispam_helper.rb
The inspiration is from the twiki BlackListPlugin:
http://twiki.org/cgi-bin/view/Plugins/BlackListPlugin
Since we don't want to track IP numbers, the main part of the
twiki strategy which remains is checking the content of an
article (@content.body in a Message class object) against a
meta-list of wikispam URLs collected by the wiki community,
[but excluding a few of these which seem to be more about
internet evolution (criticism of google and yahoo) than about
real spamming...].
Suspected spam is refused with a SpamError exception through
check_content whenever e.g. someone tries to publish a message
or a reply. A random delay (default up to 120 seconds) is
used to discourage rapid repeated tries (e.g. to find a hole
in the filter).
cheers
boud
----------------------------------------------------------------------
POSSIBLE TODOs:
TODO: If the spammer fails to turn off cookies, then suspected
spammers' cookies could be updated to reflect their status.
TODO: Elements to track repeated frequent attempts at spamming.
Problem: distinguish this from an ordinary user's behaviour -
difficult, since a lot of wikispamming is done by dedicated
humans presumably paid to do it.
TODO: Another possible todo would be to put in a
"rel=nofollow" tag to *all* a href links which have been
modified within the last e.g. 24 hours, meaning that there has
not been enough time for another user (or moderator) to check
the URL. The idea is that people do wikispam with the aim of
getting high google rank. Google in principle accepts the
nofollow tag (AFAIK it introduced it...?). So if links to
a particular website or family of websites or websites running
a particular software package systematically fail to contribute
to google rank even when they are wikispammed, then the spammers
will prioritise other, less well protected sites.
Counterarguments: Not all samizdat messages are open for editing;
this adds an extra complication to rendering messages including
a dependence on their last modification time, not just their
content; many search engines ignore nofollow tags; it weakens
the webbed nature of the web.
Note: mediawiki (or at least the wikipedia instance of it)
does something like this.
QUESTIONS:
Do we want to warn the suspected spammer or not? In this
version (0.1), the user is warned, but after a delay.
BlackListPlugin warns the user in some cases, and in other
cases gives no indication what type of error occurred.
Giving a generic user error message would give less info
to spammers and avoid helping them "improve" their sabotage
of the internet, but it would also make it harder for
genuine users who accidentally get classified as spammers
to warn the sysadmin and samizdat developers.
MORE INFO:
http://moinmoin.wikiwikiweb.de/AntiSpamGlobalSolution
http://en.wikipedia.org/wiki/Spamdexing
----------------------------------------------------------------------
--- /tmp/tmp_snapshot/samizdat/lib/samizdat/controllers/message_controller.rb
2007-04-24 22:07:48.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/controllers/message_controller.rb
2007-06-11 00:09:48.558173112 +0200
@@ -9,6 +9,7 @@
# vim: et sw=2 sts=2 ts=8 tw=0
require 'samizdat/helpers/message_helper'
+require 'samizdat/helpers/antispam_helper'
class MessageController < Controller
include MessageHelper
@@ -321,6 +324,11 @@
_('Message title is required')
@upload or @message.content.body.kind_of? String or raise UserError,
_('Message body is required')
+ if config['antispam']
+ @message.is_not_spam? or raise SpamError,
+ _('Your message looks like spam')
+ end
+
end
def edit_form(*options)
--- /tmp/tmp_snapshot/samizdat/lib/samizdat/engine/exceptions.rb
2006-11-28 17:57:47.000000000 +0100
+++ /usr/lib/ruby/1.8/samizdat/engine/exceptions.rb 2007-06-09
23:22:01.000000000 +0200
@@ -24,3 +24,6 @@
# raised when account is blocked for email confirmation
class AccountBlockedError < UserError; end
+
+# raised on suspicion of spamming
+class SpamError < UserError; end
--- /tmp/tmp_snapshot/samizdat/lib/samizdat/engine/controller.rb
2007-05-05 15:15:00.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/engine/controller.rb 2007-06-10
01:23:30.000000000 +0200
@@ -81,6 +81,14 @@
@content_for_layout =
'<p>'+_('Your account is blocked until the email address you have specified is
confirmed. Confirmation message with instructions was sent to that address.')+'</p>'
+ when SpamError
+ sleep_time = config['antispam']['delay'] # sleep time in seconds
+ sleep_time = 120 if !sleep_time # default in case value is nil
+ sleep(rand(sleep_time))
+ @title = _('User Error')
+ @content_for_layout =
+%{<p>#{error}.</p><p>}+_("Press 'Back' button of your browser to
return.")+'</p>'
+
when UserError
@title = _('User Error')
@content_for_layout =
--- /dev/null 2005-09-15 04:53:34.000000000 +0200
+++ /usr/lib/ruby/1.8/samizdat/helpers/antispam_helper.rb 2007-06-11
00:07:31.510007600 +0200
@@ -0,0 +1,80 @@
+# Samizdat HTML helpers
+#
+# Copyright (c) 2002-2007 Dmitry Borodaenko <address@hidden>
+# Copyright (c) 2007 boud indymedia <address@hidden>
+#
+# This program is free software.
+# You can distribute/modify this program under the terms of
+# the GNU General Public License version 2 or later.
+#
+# vim: et sw=2 sts=2 ts=8 tw=0
+
+require 'samizdat/engine'
+require 'samizdat/models/message'
+require 'open-uri'
+
+class Message # additional methods
+
+ def is_not_spam?
+ spam_regexp_list = cache['spam_regexp_list']
+
+ if spam_regexp_list == nil # try to update the spam list
+ anURI = config['antispam']['spam_regexp_list_URI']
+ # TODO: Is it OK to hardwire a default value here?
+ anURI = 'http://arch.thinkmo.de/cgi-bin/spam-merge' if anURI == nil
+ if anURI != nil
+ spam_regexp_list = "".to_s # initially treat this as a string
+ begin
+ Kernel.open(anURI.untaint) { |f| spam_regexp_list += f.read }
+ rescue
+ # TODO: should be some warning if the read fails
+ # 68-char "GTUBE" spamstring http://spamassassin.apache.org/gtube/
+ spam_regexp_list =
+ 'XJS\*C4JDBQADN1\.NSBN3\*2IDNEN\*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL\*C\.34X'
+ end
+ end
+ if spam_regexp_list
+ s = []
+ spam_regexp_list.split(/[\r\n]{1,}/).each do |line|
+ s_new, = line.split("#") # ignore comments
+ s_new.chomp!(" ")
+ if s_new != nil and s_new.length > 6 and
+ not ( s_new =~ /\(\?\</ ) # exclude this non-ruby regexp
extension
+ exclude = false
+ # exclude some URI's from being classified as spam
+ if config['antispam']['exclude_list'] != nil
+ config['antispam']['exclude_list'].each do |string|
+ exclude = true if s_new =~ Regexp.new(string)
+ end
+ end
+ s += [Regexp.new(s_new)] if not exclude
+ end
+ end
+ end
+
+ spam_regexp_list = s
+ cache['spam_regexp_list'] = spam_regexp_list # update cache
+ end
+
+ white_list = nil # TODO: local white_list
+ if white_list and @content.body =~ white_list
+ true
+ else
+ black_list = nil # TODO: local white_list
+ # TODO: local black_list
+ if black_list and @content.body =~ black_list
+ false
+ else
+ if spam_regexp_list
+ @is_spam = false
+ spam_regexp_list.each do |re|
+ @is_spam = true if @content.body =~ re
+ end
+ not @is_spam # toggle
+ else
+ true # default: assume is_not_spam if it got here
+ end
+ end
+ end
+ end
+end
--- /dev/null 2005-09-15 04:53:34.000000000 +0200
+++ /etc/samizdat/sites/config.yaml 2007-06-11 00:22:53.025915888 +0200
@@ -0,0 +1,26 @@
+# Antispam mechanism - refuses to publish articles/replies which
+# look like spam, and delays informing the user by a delay which
+# should be reasonable for a human publishing a non-spam article,
+# but unreasonably long for a spammer wishing to publish repeatedly.
+#
+# Comment this section out entirely to disable it.
+#
+# To enable it, you must turn on at least one entry, e.g. enabled: true
+
+antispam:
+ enabled: true
+# maximum delay in seconds - value is random from 0 to this value
+ delay: 120
+#
+# URL of meta-list of regular expressions of wiki spam
+ spam_regexp_list_URI: http://arch.thinkmo.de/cgi-bin/spam-merge
+# Use a local copy instead if you don't wish to risk automatic updates,
+# e.g.
+# spam_regexp_list_URI: http://localhost/spam-merge
+#
+# Exclude certain URIs from the big wiki-spam URI list.
+# Be careful about escaping special characters.
+ exclude_list: [ google-watch\\.org, yahoo-watch\\.org,
+ wikipedia-watch\\.org, ln-s\\.net,
+ namebase\\.org ]
+
----------------------------------------------------------------------
- antispam-0.1 patch,
boud <=