ANNOUNCE: ParBASH 0.1 release - parallel processing in BASH

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ANNOUNCE: ParBASH 0.1 release - parallel processing in BASH

From:	Milenko Petrovic
Subject:	ANNOUNCE: ParBASH 0.1 release - parallel processing in BASH
Date:	Mon, 20 Jul 2009 07:45:08 -0700 (PDT)
User-agent:	G2/1.0

Hello,

I'd like to announce the release of the 0.1 version of ParBASH. Using
ParBASH, it is possible to write bash scripts that can be
automatically parallelized on SMP, multicore,  and distributed systems
using Apache Hadoop.

Here is an example script to find top 10 references for Barack Obama
pages on wikipedia using Amazon EC2:

wiki.sh:

cat hdfs:/wikipedia-out/* | grep Obama | \
perl -ne 'while (/<link type="external" href="([^"]+)">/g) { print
"$1\n"; }' |\perl -ne 'if (/http:\/\/([^\/]+)(\/|$)/) { print
"$1\n"; }' |\
perl -ne '
  if (/([^\.]\.)+([^\.]+\.[a-zA-Z]{2,3}\.[^\.]+)$/) { print "$2\n";}
  else if (/([^\.]+\.[a-zA-Z]{2,3}\.[^\.]+)$/) { print "$1\n";}
  else if (/([^\.]\.)*([^\.]+\.[^\.]+)$/) { print "$2\n"; }' |\
sort | uniq -c > hdfs:/out

How and why of wiki.sh and parbash on
http://cloud-dev.blogspot.com/2009/06/introduction-to-parbash.html

Source code and more examples:
http://code.google.com/p/parbash

If someone wants to try compiling the code and play around with it,
please contact me, I can help you get started.

Thanks,
Milenko

[Prev in Thread]

Current Thread

[Next in Thread]

ANNOUNCE: ParBASH 0.1 release - parallel processing in BASH, Milenko Petrovic <=

Prev by Date: Re: Help with script - doesn't work properly from cron
Next by Date: builtin test command file existence fails with negation
Previous by thread: Re: bash 4.x filters out environmental variables containing a dot in the name
Next by thread: builtin test command file existence fails with negation
Index(es):
- Date
- Thread