[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Dotgnu-pnet-commits] CVS: pnetlib/System/Text/RegularExpressions arch.
From: |
Gopal.V <address@hidden> |
Subject: |
[Dotgnu-pnet-commits] CVS: pnetlib/System/Text/RegularExpressions arch.cs,NONE,1.1 cache.cs,NONE,1.1 category.cs,NONE,1.1 collections.cs,NONE,1.1 compiler.cs,NONE,1.1 debug.cs,NONE,1.1 interpreter.cs,NONE,1.1 interval.cs,NONE,1.1 match.cs,NONE,1.1 parser.cs,NONE,1.1 quicksearch.cs,NONE,1.1 regex.cs,NONE,1.1 RegexRunner.cs,NONE,1.1 RegexRunnerFactory.cs,NONE,1.1 replace.cs,NONE,1.1 syntax.cs,NONE,1.1 ChangeLog,NONE,1.1 notes.txt,NONE,1.1README,NONE,1.1 |
Date: |
Tue, 17 Dec 2002 10:50:14 -0500 |
Update of /cvsroot/dotgnu-pnet/pnetlib/System/Text/RegularExpressions
In directory subversions:/tmp/cvs-serv21851/System/Text/RegularExpressions
Added Files:
arch.cs cache.cs category.cs collections.cs compiler.cs
debug.cs interpreter.cs interval.cs match.cs parser.cs
quicksearch.cs regex.cs RegexRunner.cs RegexRunnerFactory.cs
replace.cs syntax.cs ChangeLog notes.txt README
Log Message:
Import the System.Text.RegularExpressions namespace from Mono libraries
into pnetlib and fulfil general dependencies.
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: arch.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions {
enum OpCode : ushort {
False = 0, // always fails
True, // always succeeds
// matching
Position, // zero-width position assertion
String, // match string literal
Reference, // back reference
// character matching
Character, // match character exactly
Category, // match character from category
Range, // match character from range
Set, // match character from set
In, // match character from group of tests
// capturing
Open, // open group
Close, // close group
Balance, // balance groups
// control flow
IfDefined, // conditional on capture
Sub, // non-backtracking subexpression
Test, // non-backtracking lookahead/behind
Branch, // alternative expression
Jump, // unconditional goto
Repeat, // new repeat context
Until, // repeat subexpression within context
FastRepeat, // repeat simple subexpression
Anchor, // anchoring expression
// miscellaneous
Info // pattern information
}
[Flags]
enum OpFlags : ushort {
None = 0x000,
Negate = 0x100, // succeed on mismatch
IgnoreCase = 0x200, // case insensitive matching
RightToLeft = 0x400, // right-to-left matching
Lazy = 0x800 // minimizing repeat
}
enum Position : ushort {
Any, // anywhere
Start, // start of string
\A
StartOfString, // start of string
\A
StartOfLine, // start of line
^
StartOfScan, // start of scan
\G
End, // end or before newline at end
\Z
EndOfString, // end of string
\z
EndOfLine, // end of line
$
Boundary, // word boundary
\b
NonBoundary // not word boundary
\B
};
// see category.cs for Category enum
interface IMachine {
Match Scan (Regex regex, string text, int start, int end);
}
interface IMachineFactory {
IMachine NewInstance ();
IDictionary Mapping { get; set; }
int GroupCount { get; }
}
// Anchor SKIP OFFSET
//
// Flags: [RightToLeft] ??
// SKIP: relative address of tail expression
// OFFSET: offset of anchor from start of pattern
//
// Usage:
//
// Anchor :1 OFFSET
// <expr>
// True
// 1: <tail>
//
// Notes:
//
// In practice, the anchoring expression is only going to be
// Position (StartOfString, StartOfLine, StartOfScan) or String.
// This is because the optimizer looks for position anchors at the
// start of the expression, and if that fails it looks for the
// longest substring. If an expression has neither a position
// anchor or a longest substring anchor, then the anchoring expression
// is left empty. Since an empty expression will anchor at any
// position in any string, the entire input string will be scanned.
// String LEN STR...
//
// Flags: [RightToLeft, IgnoreCase]
// LEN: length of string
// STR: string characters
// Branch SKIP
//
// SKIP: relative address of next branch
//
// Branch :1
// <alt expr 1>
// Jump :4
// 1: Branch :2
// <alt expr 2>
// Jump :4
// 2: Branch :3
// <alt expr 3>
// Jump :4
// 3: False
// 4: <tail>
// Repeat SKIP MIN MAX
//
// Flags: [Lazy]
// SKIP: relative address of Until instruction
// MIN: minimum iterations
// MAX: maximum iterations (0xffff is infinity)
//
// Repeat :1 MIN MAX
// <expr>
// Until
// 1: <tail>
// FastRepeat SKIP MIN MAX
//
// Flags: [Lazy]
// SKIP: relative address of tail expression
// MIN: minimum iterations
// MAX: maximum iterations (0xffff is infinity)
//
// FastRepeat :1 MIN MAX
// <expr>
// True
// 1: <tail>
//
// Notes:
//
// The subexpression of a FastRepeat construct must not contain any
// complex operators. These include: Open, Close, Balance, Repeat,
// FastRepeat, Sub, Test. In addition, the subexpression must have
// been determined to have a fixed width.
// Sub SKIP
//
// SKIP: relative address of tail expression
//
// Sub :1
// <expr>
// 1: <tail>
//
// Notes:
//
// The Sub operator invokes an independent subexpression. This means
// that the subexpression will match only once and so will not
// participate in any backtracking.
// Test TSKIP FSKIP
//
// TSKIP: relative address of true expression
// FSKIP: relative address of false expression
//
// Usage: (?(?=test)true|false)
//
// Test :1 :2
// <test expr>
// 1: <true expr>
// Jump
// 2: <false epxr>
// <tail>
//
// Usage: (?(?=test)true)
//
// Test :1 :2
// <test expr>
// 1: <true expr>
// 2: <tail>
//
// Usage: (?=test)
//
// Test :1 :2
// <test expr>
// 1: <true expr>
// Jump 3:
// 2: False
// 3: <tail>
//
// Notes:
//
// For negative lookaheads, just swap the values of TSKIP and
// FSKIP. For lookbehinds, the test expression must be compiled
// in reverse. The test expression is always executed as an
// independent subexpression, so its behaviour is non-backtracking
// (like a Sub clause.)
// IfDefined SKIP GID
//
// SKIP: relative address of else expression
// GID: number of group to check
//
// Usage: (?(gid)true)
//
// IfDefined :1
// <true expr>
// 1: <tail>
//
// Usage: (?(gid)true|false)
//
// IfDefined :1
// <true expr>
// Jump :2
// 1: <false expr>
// 2: <tail>
// Jump SKIP
//
// SKIP: relative address of target expression
//
// Jump :1
// ...
// :1 <target expr>
// Character CHAR
//
// Flags: [Negate, IgnoreCase, RightToLeft]
// CHAR: exact character to match
// Category CAT
//
// Flags: [Negate, RightToLeft]
// CAT: category to match (see Category enum)
// Range LO HI
//
// Flags: [Negate, IgnoreCase, RightToLeft]
// LO: lowest character in range
// HI: higest character in range
// Set LO LEN SET...
//
// Flags: [Negate, IgnoreCase, RightToLeft]
// LO: lowest character in set
// LEN: number of words in set
// SET: bit array representing characters in set
//
// Notes:
//
// Each word in the set represents 16 characters, so the first word
// defines membership for characters LO to LO + 15, the second for
// LO + 16 to LO + 31, and so on up to LO + (LEN * 16 - 1). It is
// up to the compiler to provide a compact representation for sparse
// unicode sets. The simple way is to use Set 0 4096. Other methods
// involve paritioning the set and placing the components into an
// In block.
// In SKIP
//
// SKIP: relative address of tail expression
//
// Usage: [expr]
//
// In :1
// <expr>
// True
// :1 <tail>
//
// Usage: [^expr]
//
// In :1
// <expr>
// False
// :1 <tail>
//
// Notes:
//
// The In instruction consumes a single character, using the flags
// of the first instruction in the subexpression to determine its
// IgnoreCase and RightToLeft properties. The subexpression is then
// applied to the single character as a disjunction. If any instruction
// in the subexpression succeeds, the entire In construct succeeds
// and matching continues with the tail.
// Position POS
//
// POS: position to match (see Position enum)
// Open GID
//
// GID: number of group to open
// Close GID
//
// GID: number of group to close
// Balance GID BAL
//
// GID: number of capturing group (0 if none)
// BAL: number of group to undefine
// Info GROUPS MIN MAX
//
// GROUPS: number of capturing groups
// MIN: minimum width of pattern
// MAX: maximum width of pattern (0xffff means undefined)
// False
// True
// Reference GID
//
// Flags: [IgnoreCase, RightToLeft]
// GID: number of group to reference
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: cache.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions {
class FactoryCache {
public FactoryCache (int capacity) {
this.capacity = capacity;
this.factories = new Hashtable (capacity);
this.mru_list = new MRUList ();
}
public void Add (string pattern, RegexOptions options,
IMachineFactory factory) {
lock (this) {
Key k = new Key (pattern, options);
while (factories.Count >= capacity) {
object victim = mru_list.Evict ();
if (victim != null)
factories.Remove ((Key)victim);
}
factories[k] = factory;
mru_list.Use (k);
}
}
public IMachineFactory Lookup (string pattern, RegexOptions
options) {
lock (this) {
Key k = new Key (pattern, options);
if (factories.Contains (k)) {
mru_list.Use (k);
return (IMachineFactory)factories[k];
}
}
return null;
}
private int capacity;
private Hashtable factories;
private MRUList mru_list;
class Key {
public string pattern;
public RegexOptions options;
public Key (string pattern, RegexOptions options) {
this.pattern = pattern;
this.options = options;
}
public override int GetHashCode () {
return pattern.GetHashCode () ^ (int)options;
}
public override bool Equals (object o) {
if (o == null || !(o is Key))
return false;
Key k = (Key)o;
return options == k.options && pattern.Equals
(k.pattern);
}
public override string ToString () {
return "('" + pattern + "', [" + options + "])";
}
}
}
class MRUList {
public MRUList () {
head = tail = null;
}
public void Use (object o) {
Node node;
if (head == null) {
node = new Node (o);
head = tail = node;
return;
}
node = head;
while (node != null && !o.Equals (node.value))
node = node.previous;
if (node == null)
node = new Node (o);
else {
if (node == head)
return;
if (node == tail)
tail = node.next;
else
node.previous.next = node.next;
node.next.previous = node.previous;
}
head.next = node;
node.previous = head;
node.next = null;
head = node;
}
public object Evict () {
if (tail == null)
return null;
object o = tail.value;
tail = tail.next;
if (tail == null)
head = null;
else
tail.previous = null;
return o;
}
private Node head, tail;
private class Node {
public object value;
public Node previous, next;
public Node (object value) {
this.value = value;
}
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: category.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Globalization;
namespace System.Text.RegularExpressions {
enum Category : ushort {
None,
// canonical classes
Any, // any character except newline
.
AnySingleline, // any character
. (s option)
Word, // any word character
\w
Digit, // any digit character
\d
WhiteSpace, // any whitespace character
\s
// ECMAScript classes
EcmaAny,
EcmaAnySingleline,
EcmaWord, // [a-zA-Z_0-9]
EcmaDigit, // [0-9]
EcmaWhiteSpace, // [ \f\n\r\t\v]
// unicode categories
UnicodeL, // Letter
UnicodeM, // Mark
UnicodeN, // Number
UnicodeZ, // Separator
UnicodeP, // Punctuation
UnicodeS, // Symbol
UnicodeC, // Other
UnicodeLu, // UppercaseLetter
UnicodeLl, // LowercaseLetter
UnicodeLt, // TitlecaseLetter
UnicodeLm, // ModifierLetter
UnicodeLo, // OtherLetter
UnicodeMn, // NonspacingMark
UnicodeMe, // EnclosingMark
UnicodeMc, // SpacingMark
UnicodeNd, // DecimalNumber
UnicodeNl, // LetterNumber
UnicodeNo, // OtherNumber
UnicodeZs, // SpaceSeparator
UnicodeZl, // LineSeparator
UnicodeZp, // ParagraphSeparator
UnicodePd, // DashPunctuation
UnicodePs, // OpenPunctuation
UnicodePi, // InitialPunctuation
UnicodePe, // ClosePunctuation
UnicodePf, // FinalPunctuation
UnicodePc, // ConnectorPunctuation
UnicodePo, // OtherPunctuation
UnicodeSm, // MathSymbol
UnicodeSc, // CurrencySymbol
UnicodeSk, // ModifierSymbol
UnicodeSo, // OtherSymbol
UnicodeCc, // Control
UnicodeCf, // Format
UnicodeCo, // PrivateUse
UnicodeCs, // Surrogate
UnicodeCn, // Unassigned
// unicode block ranges
// notes: the categories marked with a star are valid unicode
block ranges,
// but don't seem to be accepted by the MS parser using the
/p{...} format.
// any ideas?
UnicodeBasicLatin,
UnicodeLatin1Supplement, // *
UnicodeLatinExtendedA, // *
UnicodeLatinExtendedB, // *
UnicodeIPAExtensions,
UnicodeSpacingModifierLetters,
UnicodeCombiningDiacriticalMarks,
UnicodeGreek,
UnicodeCyrillic,
UnicodeArmenian,
UnicodeHebrew,
UnicodeArabic,
UnicodeSyriac,
UnicodeThaana,
UnicodeDevanagari,
UnicodeBengali,
UnicodeGurmukhi,
UnicodeGujarati,
UnicodeOriya,
UnicodeTamil,
UnicodeTelugu,
UnicodeKannada,
UnicodeMalayalam,
UnicodeSinhala,
UnicodeThai,
UnicodeLao,
UnicodeTibetan,
UnicodeMyanmar,
UnicodeGeorgian,
UnicodeHangulJamo,
UnicodeEthiopic,
UnicodeCherokee,
UnicodeUnifiedCanadianAboriginalSyllabics,
UnicodeOgham,
UnicodeRunic,
UnicodeKhmer,
UnicodeMongolian,
UnicodeLatinExtendedAdditional,
UnicodeGreekExtended,
UnicodeGeneralPunctuation,
UnicodeSuperscriptsandSubscripts,
UnicodeCurrencySymbols,
UnicodeCombiningMarksforSymbols,
UnicodeLetterlikeSymbols,
UnicodeNumberForms,
UnicodeArrows,
UnicodeMathematicalOperators,
UnicodeMiscellaneousTechnical,
UnicodeControlPictures,
UnicodeOpticalCharacterRecognition,
UnicodeEnclosedAlphanumerics,
UnicodeBoxDrawing,
UnicodeBlockElements,
UnicodeGeometricShapes,
UnicodeMiscellaneousSymbols,
UnicodeDingbats,
UnicodeBraillePatterns,
UnicodeCJKRadicalsSupplement,
UnicodeKangxiRadicals,
UnicodeIdeographicDescriptionCharacters,
UnicodeCJKSymbolsandPunctuation,
UnicodeHiragana,
UnicodeKatakana,
UnicodeBopomofo,
UnicodeHangulCompatibilityJamo,
UnicodeKanbun,
UnicodeBopomofoExtended,
UnicodeEnclosedCJKLettersandMonths,
UnicodeCJKCompatibility,
UnicodeCJKUnifiedIdeographsExtensionA,
UnicodeCJKUnifiedIdeographs,
UnicodeYiSyllables,
UnicodeYiRadicals,
UnicodeHangulSyllables,
UnicodeHighSurrogates,
UnicodeHighPrivateUseSurrogates,
UnicodeLowSurrogates,
UnicodePrivateUse,
UnicodeCJKCompatibilityIdeographs,
UnicodeAlphabeticPresentationForms,
UnicodeArabicPresentationFormsA, // *
UnicodeCombiningHalfMarks,
UnicodeCJKCompatibilityForms,
UnicodeSmallFormVariants,
UnicodeArabicPresentationFormsB, // *
UnicodeSpecials,
UnicodeHalfwidthandFullwidthForms,
UnicodeOldItalic,
UnicodeGothic,
UnicodeDeseret,
UnicodeByzantineMusicalSymbols,
UnicodeMusicalSymbols,
UnicodeMathematicalAlphanumericSymbols,
UnicodeCJKUnifiedIdeographsExtensionB,
UnicodeCJKCompatibilityIdeographsSupplement,
UnicodeTags
}
class CategoryUtils {
public static Category CategoryFromName (string name) {
try {
if (name.Substring (0, 2).Equals ("Is"))
// remove prefix from block range
name = name.Substring (2);
return (Category)Enum.Parse (typeof (Category),
"Unicode" + name);
}
catch (ArgumentException) {
return Category.None;
}
}
public static bool IsCategory (Category cat, char c) {
switch (cat) {
case Category.None:
return false;
case Category.Any:
return c != '\n';
case Category.AnySingleline:
return true;
case Category.Word:
return
Char.IsLetterOrDigit (c) ||
IsCategory
(UnicodeCategory.ConnectorPunctuation, c);
case Category.Digit:
return Char.IsDigit (c);
case Category.WhiteSpace:
return Char.IsWhiteSpace (c);
// ECMA categories
case Category.EcmaAny:
return c != '\n';
case Category.EcmaAnySingleline:
return true;
case Category.EcmaWord:
return
'a' <= c && c <= 'z' ||
'A' <= c && c <= 'Z' ||
'0' <= c && c <= '9' ||
'_' == c;
case Category.EcmaDigit:
return
'0' <= c && c <= 9;
case Category.EcmaWhiteSpace:
return
c == ' ' ||
c == '\f' ||
c == '\n' ||
c == '\r' ||
c == '\t' ||
c == '\v';
// Unicode categories...
// letter
case Category.UnicodeLu: return IsCategory
(UnicodeCategory.UppercaseLetter, c);
case Category.UnicodeLl: return IsCategory
(UnicodeCategory.LowercaseLetter, c);
case Category.UnicodeLt: return IsCategory
(UnicodeCategory.TitlecaseLetter, c);
case Category.UnicodeLm: return IsCategory
(UnicodeCategory.ModifierLetter, c);
case Category.UnicodeLo: return IsCategory
(UnicodeCategory.OtherLetter, c);
// mark
case Category.UnicodeMn: return IsCategory
(UnicodeCategory.NonSpacingMark, c);
case Category.UnicodeMe: return IsCategory
(UnicodeCategory.EnclosingMark, c);
case Category.UnicodeMc: return IsCategory
(UnicodeCategory.SpacingCombiningMark, c);
case Category.UnicodeNd: return IsCategory
(UnicodeCategory.DecimalDigitNumber, c);
// number
case Category.UnicodeNl: return IsCategory
(UnicodeCategory.LetterNumber, c);
case Category.UnicodeNo: return IsCategory
(UnicodeCategory.OtherNumber, c);
// separator
case Category.UnicodeZs: return IsCategory
(UnicodeCategory.SpaceSeparator, c);
case Category.UnicodeZl: return IsCategory
(UnicodeCategory.LineSeparator, c);
case Category.UnicodeZp: return IsCategory
(UnicodeCategory.ParagraphSeparator, c);
// punctuation
case Category.UnicodePd: return IsCategory
(UnicodeCategory.DashPunctuation, c);
case Category.UnicodePs: return IsCategory
(UnicodeCategory.OpenPunctuation, c);
case Category.UnicodePi: return IsCategory
(UnicodeCategory.InitialQuotePunctuation, c);
case Category.UnicodePe: return IsCategory
(UnicodeCategory.ClosePunctuation, c);
case Category.UnicodePf: return IsCategory
(UnicodeCategory.FinalQuotePunctuation, c);
case Category.UnicodePc: return IsCategory
(UnicodeCategory.ConnectorPunctuation, c);
case Category.UnicodePo: return IsCategory
(UnicodeCategory.OtherPunctuation, c);
// symbol
case Category.UnicodeSm: return IsCategory
(UnicodeCategory.MathSymbol, c);
case Category.UnicodeSc: return IsCategory
(UnicodeCategory.CurrencySymbol, c);
case Category.UnicodeSk: return IsCategory
(UnicodeCategory.ModifierSymbol, c);
case Category.UnicodeSo: return IsCategory
(UnicodeCategory.OtherSymbol, c);
// other
case Category.UnicodeCc: return IsCategory
(UnicodeCategory.Control, c);
case Category.UnicodeCf: return IsCategory
(UnicodeCategory.Format, c);
case Category.UnicodeCo: return IsCategory
(UnicodeCategory.PrivateUse, c);
case Category.UnicodeCs: return IsCategory
(UnicodeCategory.Surrogate, c);
case Category.UnicodeCn: return IsCategory
(UnicodeCategory.OtherNotAssigned, c);
case Category.UnicodeL: // letter
return
IsCategory
(UnicodeCategory.UppercaseLetter, c) ||
IsCategory
(UnicodeCategory.LowercaseLetter, c) ||
IsCategory
(UnicodeCategory.TitlecaseLetter, c) ||
IsCategory
(UnicodeCategory.ModifierLetter, c) ||
IsCategory
(UnicodeCategory.OtherLetter, c);
case Category.UnicodeM: // mark
return
IsCategory
(UnicodeCategory.NonSpacingMark, c) ||
IsCategory
(UnicodeCategory.EnclosingMark, c) ||
IsCategory
(UnicodeCategory.SpacingCombiningMark, c);
case Category.UnicodeN: // number
return
IsCategory
(UnicodeCategory.DecimalDigitNumber, c) ||
IsCategory
(UnicodeCategory.LetterNumber, c) ||
IsCategory
(UnicodeCategory.OtherNumber, c);
case Category.UnicodeZ: // separator
return
IsCategory
(UnicodeCategory.SpaceSeparator, c) ||
IsCategory
(UnicodeCategory.LineSeparator, c) ||
IsCategory
(UnicodeCategory.ParagraphSeparator, c);
case Category.UnicodeP: // punctuation
return
IsCategory
(UnicodeCategory.DashPunctuation, c) ||
IsCategory
(UnicodeCategory.OpenPunctuation, c) ||
IsCategory
(UnicodeCategory.InitialQuotePunctuation, c) ||
IsCategory
(UnicodeCategory.ClosePunctuation, c) ||
IsCategory
(UnicodeCategory.FinalQuotePunctuation, c) ||
IsCategory
(UnicodeCategory.ConnectorPunctuation, c) ||
IsCategory
(UnicodeCategory.OtherPunctuation, c);
case Category.UnicodeS: // symbol
return
IsCategory (UnicodeCategory.MathSymbol,
c) ||
IsCategory
(UnicodeCategory.CurrencySymbol, c) ||
IsCategory
(UnicodeCategory.ModifierSymbol, c) ||
IsCategory
(UnicodeCategory.OtherSymbol, c);
case Category.UnicodeC: // other
return
IsCategory (UnicodeCategory.Control, c)
||
IsCategory (UnicodeCategory.Format, c)
||
IsCategory (UnicodeCategory.PrivateUse,
c) ||
IsCategory (UnicodeCategory.Surrogate,
c) ||
IsCategory
(UnicodeCategory.OtherNotAssigned, c);
// Unicode block ranges...
case Category.UnicodeBasicLatin:
return '\u0000' <= c && c <= '\u007F';
case Category.UnicodeLatin1Supplement:
return '\u0080' <= c && c <= '\u00FF';
case Category.UnicodeLatinExtendedA:
return '\u0100' <= c && c <= '\u017F';
case Category.UnicodeLatinExtendedB:
return '\u0180' <= c && c <= '\u024F';
case Category.UnicodeIPAExtensions:
return '\u0250' <= c && c <= '\u02AF';
case Category.UnicodeSpacingModifierLetters:
return '\u02B0' <= c && c <= '\u02FF';
case Category.UnicodeCombiningDiacriticalMarks:
return '\u0300' <= c && c <= '\u036F';
case Category.UnicodeGreek:
return '\u0370' <= c && c <= '\u03FF';
case Category.UnicodeCyrillic:
return '\u0400' <= c && c <= '\u04FF';
case Category.UnicodeArmenian:
return '\u0530' <= c && c <= '\u058F';
case Category.UnicodeHebrew:
return '\u0590' <= c && c <= '\u05FF';
case Category.UnicodeArabic:
return '\u0600' <= c && c <= '\u06FF';
case Category.UnicodeSyriac:
return '\u0700' <= c && c <= '\u074F';
case Category.UnicodeThaana:
return '\u0780' <= c && c <= '\u07BF';
case Category.UnicodeDevanagari:
return '\u0900' <= c && c <= '\u097F';
case Category.UnicodeBengali:
return '\u0980' <= c && c <= '\u09FF';
case Category.UnicodeGurmukhi:
return '\u0A00' <= c && c <= '\u0A7F';
case Category.UnicodeGujarati:
return '\u0A80' <= c && c <= '\u0AFF';
case Category.UnicodeOriya:
return '\u0B00' <= c && c <= '\u0B7F';
case Category.UnicodeTamil:
return '\u0B80' <= c && c <= '\u0BFF';
case Category.UnicodeTelugu:
return '\u0C00' <= c && c <= '\u0C7F';
case Category.UnicodeKannada:
return '\u0C80' <= c && c <= '\u0CFF';
case Category.UnicodeMalayalam:
return '\u0D00' <= c && c <= '\u0D7F';
case Category.UnicodeSinhala:
return '\u0D80' <= c && c <= '\u0DFF';
case Category.UnicodeThai:
return '\u0E00' <= c && c <= '\u0E7F';
case Category.UnicodeLao:
return '\u0E80' <= c && c <= '\u0EFF';
case Category.UnicodeTibetan:
return '\u0F00' <= c && c <= '\u0FFF';
case Category.UnicodeMyanmar:
return '\u1000' <= c && c <= '\u109F';
case Category.UnicodeGeorgian:
return '\u10A0' <= c && c <= '\u10FF';
case Category.UnicodeHangulJamo:
return '\u1100' <= c && c <= '\u11FF';
case Category.UnicodeEthiopic:
return '\u1200' <= c && c <= '\u137F';
case Category.UnicodeCherokee:
return '\u13A0' <= c && c <= '\u13FF';
case Category.UnicodeUnifiedCanadianAboriginalSyllabics:
return '\u1400' <= c && c <= '\u167F';
case Category.UnicodeOgham:
return '\u1680' <= c && c <= '\u169F';
case Category.UnicodeRunic:
return '\u16A0' <= c && c <= '\u16FF';
case Category.UnicodeKhmer:
return '\u1780' <= c && c <= '\u17FF';
case Category.UnicodeMongolian:
return '\u1800' <= c && c <= '\u18AF';
case Category.UnicodeLatinExtendedAdditional:
return '\u1E00' <= c && c <= '\u1EFF';
case Category.UnicodeGreekExtended:
return '\u1F00' <= c && c <= '\u1FFF';
case Category.UnicodeGeneralPunctuation:
return '\u2000' <= c && c <= '\u206F';
case Category.UnicodeSuperscriptsandSubscripts:
return '\u2070' <= c && c <= '\u209F';
case Category.UnicodeCurrencySymbols:
return '\u20A0' <= c && c <= '\u20CF';
case Category.UnicodeCombiningMarksforSymbols:
return '\u20D0' <= c && c <= '\u20FF';
case Category.UnicodeLetterlikeSymbols:
return '\u2100' <= c && c <= '\u214F';
case Category.UnicodeNumberForms:
return '\u2150' <= c && c <= '\u218F';
case Category.UnicodeArrows:
return '\u2190' <= c && c <= '\u21FF';
case Category.UnicodeMathematicalOperators:
return '\u2200' <= c && c <= '\u22FF';
case Category.UnicodeMiscellaneousTechnical:
return '\u2300' <= c && c <= '\u23FF';
case Category.UnicodeControlPictures:
return '\u2400' <= c && c <= '\u243F';
case Category.UnicodeOpticalCharacterRecognition:
return '\u2440' <= c && c <= '\u245F';
case Category.UnicodeEnclosedAlphanumerics:
return '\u2460' <= c && c <= '\u24FF';
case Category.UnicodeBoxDrawing:
return '\u2500' <= c && c <= '\u257F';
case Category.UnicodeBlockElements:
return '\u2580' <= c && c <= '\u259F';
case Category.UnicodeGeometricShapes:
return '\u25A0' <= c && c <= '\u25FF';
case Category.UnicodeMiscellaneousSymbols:
return '\u2600' <= c && c <= '\u26FF';
case Category.UnicodeDingbats:
return '\u2700' <= c && c <= '\u27BF';
case Category.UnicodeBraillePatterns:
return '\u2800' <= c && c <= '\u28FF';
case Category.UnicodeCJKRadicalsSupplement:
return '\u2E80' <= c && c <= '\u2EFF';
case Category.UnicodeKangxiRadicals:
return '\u2F00' <= c && c <= '\u2FDF';
case Category.UnicodeIdeographicDescriptionCharacters:
return '\u2FF0' <= c && c <= '\u2FFF';
case Category.UnicodeCJKSymbolsandPunctuation:
return '\u3000' <= c && c <= '\u303F';
case Category.UnicodeHiragana:
return '\u3040' <= c && c <= '\u309F';
case Category.UnicodeKatakana:
return '\u30A0' <= c && c <= '\u30FF';
case Category.UnicodeBopomofo:
return '\u3100' <= c && c <= '\u312F';
case Category.UnicodeHangulCompatibilityJamo:
return '\u3130' <= c && c <= '\u318F';
case Category.UnicodeKanbun:
return '\u3190' <= c && c <= '\u319F';
case Category.UnicodeBopomofoExtended:
return '\u31A0' <= c && c <= '\u31BF';
case Category.UnicodeEnclosedCJKLettersandMonths:
return '\u3200' <= c && c <= '\u32FF';
case Category.UnicodeCJKCompatibility:
return '\u3300' <= c && c <= '\u33FF';
case Category.UnicodeCJKUnifiedIdeographsExtensionA:
return '\u3400' <= c && c <= '\u4DB5';
case Category.UnicodeCJKUnifiedIdeographs:
return '\u4E00' <= c && c <= '\u9FFF';
case Category.UnicodeYiSyllables:
return '\uA000' <= c && c <= '\uA48F';
case Category.UnicodeYiRadicals:
return '\uA490' <= c && c <= '\uA4CF';
case Category.UnicodeHangulSyllables:
return '\uAC00' <= c && c <= '\uD7A3';
case Category.UnicodeHighSurrogates:
return '\uD800' <= c && c <= '\uDB7F';
case Category.UnicodeHighPrivateUseSurrogates:
return '\uDB80' <= c && c <= '\uDBFF';
case Category.UnicodeLowSurrogates:
return '\uDC00' <= c && c <= '\uDFFF';
case Category.UnicodePrivateUse:
return '\uE000' <= c && c <= '\uF8FF';
case Category.UnicodeCJKCompatibilityIdeographs:
return '\uF900' <= c && c <= '\uFAFF';
case Category.UnicodeAlphabeticPresentationForms:
return '\uFB00' <= c && c <= '\uFB4F';
case Category.UnicodeArabicPresentationFormsA:
return '\uFB50' <= c && c <= '\uFDFF';
case Category.UnicodeCombiningHalfMarks:
return '\uFE20' <= c && c <= '\uFE2F';
case Category.UnicodeCJKCompatibilityForms:
return '\uFE30' <= c && c <= '\uFE4F';
case Category.UnicodeSmallFormVariants:
return '\uFE50' <= c && c <= '\uFE6F';
case Category.UnicodeArabicPresentationFormsB:
return '\uFE70' <= c && c <= '\uFEFE';
case Category.UnicodeHalfwidthandFullwidthForms:
return '\uFF00' <= c && c <= '\uFFEF';
case Category.UnicodeSpecials:
return
'\uFEFF' <= c && c <= '\uFEFF' ||
'\uFFF0' <= c && c <= '\uFFFD';
// these block ranges begin above 0x10000
case Category.UnicodeOldItalic:
case Category.UnicodeGothic:
case Category.UnicodeDeseret:
case Category.UnicodeByzantineMusicalSymbols:
case Category.UnicodeMusicalSymbols:
case Category.UnicodeMathematicalAlphanumericSymbols:
case Category.UnicodeCJKUnifiedIdeographsExtensionB:
case
Category.UnicodeCJKCompatibilityIdeographsSupplement:
case Category.UnicodeTags:
return false;
default:
return false;
}
}
private static bool IsCategory (UnicodeCategory uc, char c) {
if (Char.GetUnicodeCategory (c) == uc)
return true;
return false;
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: collections.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions {
public abstract class RegexCollectionBase : ICollection, IEnumerable {
public int Count {
get { return list.Count; }
}
public bool IsReadOnly {
get { return true; } // FIXME
}
public bool IsSynchronized {
get { return false; } // FIXME
}
public object SyncRoot {
get { return list; } // FIXME
}
public void CopyTo (Array array, int index) {
foreach (Object o in list) {
if (index > array.Length)
break;
array.SetValue (o, index ++);
}
}
public IEnumerator GetEnumerator () {
return new Enumerator (list);
}
// internal methods
internal RegexCollectionBase () {
list = new ArrayList ();
}
internal void Add (Object o) {
list.Add (o);
}
internal void Reverse () {
list.Reverse ();
}
// IEnumerator implementation
private class Enumerator : IEnumerator {
public Enumerator (IList list) {
this.list = list;
Reset ();
}
public object Current {
get {
if (ptr >= list.Count)
throw new
InvalidOperationException ();
return list[ptr];
}
}
public bool MoveNext () {
if (ptr > list.Count)
throw new InvalidOperationException ();
return ++ ptr < list.Count;
}
public void Reset () {
ptr = -1;
}
private IList list;
private int ptr;
}
// protected fields
protected ArrayList list;
}
[Serializable]
public class CaptureCollection : RegexCollectionBase, ICollection,
IEnumerable {
public Capture this[int i] {
get { return (Capture)list[i]; }
}
internal CaptureCollection () {
}
}
[Serializable]
public class GroupCollection : RegexCollectionBase, ICollection,
IEnumerable {
public Group this[int i] {
get { return (Group)list[i]; }
}
internal GroupCollection () {
}
}
[Serializable]
public class MatchCollection : RegexCollectionBase, ICollection,
IEnumerable {
public virtual Match this[int i] {
get { return (Match)list[i]; }
}
internal MatchCollection () {
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: compiler.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions {
abstract class LinkRef {
// empty
}
interface ICompiler {
void Reset ();
IMachineFactory GetMachineFactory ();
// instruction emission
void EmitFalse ();
void EmitTrue ();
// character matching
void EmitCharacter (char c, bool negate, bool ignore, bool
reverse);
void EmitCategory (Category cat, bool negate, bool reverse);
void EmitRange (char lo, char hi, bool negate, bool ignore,
bool reverse);
void EmitSet (char lo, BitArray set, bool negate, bool ignore,
bool reverse);
// other operators
void EmitString (string str, bool ignore, bool reverse);
void EmitPosition (Position pos);
void EmitOpen (int gid);
void EmitClose (int gid);
void EmitBalance (int gid, int balance);
void EmitReference (int gid, bool ignore, bool reverse);
// constructs
void EmitIfDefined (int gid, LinkRef tail);
void EmitSub (LinkRef tail);
void EmitTest (LinkRef yes, LinkRef tail);
void EmitBranch (LinkRef next);
void EmitJump (LinkRef target);
void EmitRepeat (int min, int max, bool lazy, LinkRef until);
void EmitUntil (LinkRef repeat);
void EmitIn (LinkRef tail);
void EmitInfo (int count, int min, int max);
void EmitFastRepeat (int min, int max, bool lazy, LinkRef tail);
void EmitAnchor (int offset, LinkRef tail);
LinkRef NewLink ();
void ResolveLink (LinkRef link);
}
class InterpreterFactory : IMachineFactory {
public InterpreterFactory (ushort[] pattern) {
this.pattern = pattern;
}
public IMachine NewInstance () {
return new Interpreter (pattern);
}
public int GroupCount {
get { return pattern[0]; }
}
public IDictionary Mapping {
get { return mapping; }
set { mapping = value; }
}
private IDictionary mapping;
private ushort[] pattern;
}
class PatternCompiler : ICompiler {
public static ushort EncodeOp (OpCode op, OpFlags flags) {
return (ushort)((int)op | ((int)flags & 0xff00));
}
public static void DecodeOp (ushort word, out OpCode op, out
OpFlags flags) {
op = (OpCode)(word & 0x00ff);
flags = (OpFlags)(word & 0xff00);
}
public PatternCompiler () {
pgm = new ArrayList ();
}
// ICompiler implementation
public void Reset () {
pgm.Clear ();
}
public IMachineFactory GetMachineFactory () {
ushort[] image = new ushort[pgm.Count];
pgm.CopyTo (image);
return new InterpreterFactory (image);
}
public void EmitFalse () {
Emit (OpCode.False);
}
public void EmitTrue () {
Emit (OpCode.True);
}
public void EmitCharacter (char c, bool negate, bool ignore,
bool reverse) {
Emit (OpCode.Character, MakeFlags (negate, ignore,
reverse, false));
if (ignore)
c = Char.ToLower (c);
Emit ((ushort)c);
}
public void EmitCategory (Category cat, bool negate, bool
reverse) {
Emit (OpCode.Category, MakeFlags (negate, false,
reverse, false));
Emit ((ushort)cat);
}
public void EmitRange (char lo, char hi, bool negate, bool
ignore, bool reverse) {
Emit (OpCode.Range, MakeFlags (negate, ignore, reverse,
false));
Emit ((ushort)lo);
Emit ((ushort)hi);
}
public void EmitSet (char lo, BitArray set, bool negate, bool
ignore, bool reverse) {
Emit (OpCode.Set, MakeFlags (negate, ignore, reverse,
false));
Emit ((ushort)lo);
int len = (set.Length + 0xf) >> 4;
Emit ((ushort)len);
int b = 0;
while (len -- != 0) {
ushort word = 0;
for (int i = 0; i < 16; ++ i) {
if (b >= set.Length)
break;
if (set[b ++])
word |= (ushort)(1 << i);
}
Emit (word);
}
}
public void EmitString (string str, bool ignore, bool reverse) {
Emit (OpCode.String, MakeFlags (false, ignore, reverse,
false));
int len = str.Length;
Emit ((ushort)len);
if (ignore)
str = str.ToLower ();
for (int i = 0; i < len; ++ i)
Emit ((ushort)str[i]);
}
public void EmitPosition (Position pos) {
Emit (OpCode.Position, 0);
Emit ((ushort)pos);
}
public void EmitOpen (int gid) {
Emit (OpCode.Open);
Emit ((ushort)gid);
}
public void EmitClose (int gid) {
Emit (OpCode.Close);
Emit ((ushort)gid);
}
public void EmitBalance (int gid, int balance) {
Emit (OpCode.Balance);
Emit ((ushort)gid);
Emit ((ushort)balance);
}
public void EmitReference (int gid, bool ignore, bool reverse) {
Emit (OpCode.Reference, MakeFlags (false, ignore,
reverse, false));
Emit ((ushort)gid);
}
public void EmitIfDefined (int gid, LinkRef tail) {
BeginLink (tail);
Emit (OpCode.IfDefined);
EmitLink (tail);
Emit ((ushort)gid);
}
public void EmitSub (LinkRef tail) {
BeginLink (tail);
Emit (OpCode.Sub);
EmitLink (tail);
}
public void EmitTest (LinkRef yes, LinkRef tail) {
BeginLink (yes);
BeginLink (tail);
Emit (OpCode.Test);
EmitLink (yes);
EmitLink (tail);
}
public void EmitBranch (LinkRef next) {
BeginLink (next);
Emit (OpCode.Branch, 0);
EmitLink (next);
}
public void EmitJump (LinkRef target) {
BeginLink (target);
Emit (OpCode.Jump, 0);
EmitLink (target);
}
public void EmitRepeat (int min, int max, bool lazy, LinkRef
until) {
BeginLink (until);
Emit (OpCode.Repeat, MakeFlags (false, false, false,
lazy));
EmitLink (until);
Emit ((ushort)min);
Emit ((ushort)max);
}
public void EmitUntil (LinkRef repeat) {
ResolveLink (repeat);
Emit (OpCode.Until);
}
public void EmitFastRepeat (int min, int max, bool lazy,
LinkRef tail) {
BeginLink (tail);
Emit (OpCode.FastRepeat, MakeFlags (false, false,
false, lazy));
EmitLink (tail);
Emit ((ushort)min);
Emit ((ushort)max);
}
public void EmitIn (LinkRef tail) {
BeginLink (tail);
Emit (OpCode.In);
EmitLink (tail);
}
public void EmitAnchor (int offset, LinkRef tail) {
BeginLink (tail);
Emit (OpCode.Anchor);
EmitLink (tail);
Emit ((ushort)offset);
}
public void EmitInfo (int count, int min, int max) {
Emit (OpCode.Info);
Emit ((ushort)count);
Emit ((ushort)min);
Emit ((ushort)max);
}
public LinkRef NewLink () {
return new PatternLinkStack ();
}
public void ResolveLink (LinkRef lref) {
PatternLinkStack stack = (PatternLinkStack)lref;
while (stack.Pop ())
pgm[stack.OffsetAddress] =
(ushort)stack.GetOffset (CurrentAddress);
}
// private members
private static OpFlags MakeFlags (bool negate, bool ignore,
bool reverse, bool lazy) {
OpFlags flags = 0;
if (negate) flags |= OpFlags.Negate;
if (ignore) flags |= OpFlags.IgnoreCase;
if (reverse) flags |= OpFlags.RightToLeft;
if (lazy) flags |= OpFlags.Lazy;
return flags;
}
private void Emit (OpCode op) {
Emit (op, (OpFlags)0);
}
private void Emit (OpCode op, OpFlags flags) {
Emit (EncodeOp (op, flags));
}
private void Emit (ushort word) {
pgm.Add (word);
}
private int CurrentAddress {
get { return pgm.Count; }
}
private void BeginLink (LinkRef lref) {
PatternLinkStack stack = (PatternLinkStack)lref;
stack.BaseAddress = CurrentAddress;
}
private void EmitLink (LinkRef lref) {
PatternLinkStack stack = (PatternLinkStack)lref;
stack.OffsetAddress = CurrentAddress;
Emit ((ushort)0); // placeholder
stack.Push ();
}
private class PatternLinkStack : LinkStack {
public PatternLinkStack () {
}
public int BaseAddress {
set { link.base_addr = value; }
}
public int OffsetAddress {
get { return link.offset_addr; }
set { link.offset_addr = value; }
}
public int GetOffset (int target_addr) {
return target_addr - link.base_addr;
}
// LinkStack implementation
protected override object GetCurrent () { return link; }
protected override void SetCurrent (object l) { link =
(Link)l; }
private struct Link {
public int base_addr;
public int offset_addr;
}
Link link;
}
private ArrayList pgm;
}
abstract class LinkStack : LinkRef {
public LinkStack () {
stack = new Stack ();
}
public void Push () {
stack.Push (GetCurrent ());
}
public bool Pop () {
if (stack.Count > 0) {
SetCurrent (stack.Pop ());
return true;
}
return false;
}
protected abstract object GetCurrent ();
protected abstract void SetCurrent (object l);
private Stack stack;
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: debug.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions {
class Disassembler {
public static void DisassemblePattern (ushort[] image) {
DisassembleBlock (image, 0, 0);
}
public static void DisassembleBlock (ushort[] image, int pc,
int depth) {
OpCode op;
OpFlags flags;
for (;;) {
if (pc >= image.Length)
return;
PatternCompiler.DecodeOp (image[pc], out op,
out flags);
Console.Write (FormatAddress (pc) + ": ");
// address
Console.Write (new string (' ', depth * 2));
// indent
Console.Write (DisassembleOp (image, pc));
// instruction
Console.WriteLine ();
int skip;
switch (op) {
case OpCode.False: case OpCode.True: case
OpCode.Until:
skip = 1;
break;
case OpCode.Character: case OpCode.Category:
case OpCode.Position:
case OpCode.Open: case OpCode.Close: case
OpCode.Reference:
case OpCode.Sub: case OpCode.Branch: case
OpCode.Jump: case OpCode.In:
skip = 2;
break;
case OpCode.Balance: case OpCode.IfDefined:
case OpCode.Range:
case OpCode.Test: case OpCode.Anchor:
skip = 3;
break;
case OpCode.Repeat: case OpCode.FastRepeat:
case OpCode.Info:
skip = 4;
break;
case OpCode.String: skip = image[pc + 1] + 2;
break;
case OpCode.Set: skip = image[pc + 2] + 3;
break;
default:
skip = 1;
break;
}
pc += skip;
}
}
public static string DisassembleOp (ushort[] image, int pc) {
OpCode op;
OpFlags flags;
PatternCompiler.DecodeOp (image[pc], out op, out flags);
string str = op.ToString ();
if (flags != 0)
str += "[" + flags.ToString ("f") + "]";
switch (op) {
case OpCode.False: case OpCode.True: case OpCode.Until:
default:
break;
case OpCode.Info:
str += " " + image[pc + 1];
str += " (" + image[pc + 2] + ", " + image[pc +
3] + ")";
break;
case OpCode.Character:
str += " '" + FormatChar ((char)image[pc + 1])
+ "'";
break;
case OpCode.Category:
str += " /" + (Category)image[pc + 1];
break;
case OpCode.Range:
str += " '" + FormatChar ((char)image[pc + 1])
+ "', ";
str += " '" + FormatChar ((char)image[pc + 2])
+ "'";
break;
case OpCode.Set:
str += " " + FormatSet (image, pc + 1);
break;
case OpCode.String:
str += " '" + ReadString (image, pc + 1) + "'";
break;
case OpCode.Position:
str += " /" + (Position)image[pc + 1];
break;
case OpCode.Open: case OpCode.Close: case
OpCode.Reference:
str += " " + image[pc + 1];
break;
case OpCode.Balance:
str += " " + image[pc + 1] + " " + image[pc +
2];
break;
case OpCode.IfDefined: case OpCode.Anchor:
str += " :" + FormatAddress (pc + image[pc +
1]);
str += " " + image[pc + 2];
break;
case OpCode.Sub: case OpCode.Branch: case OpCode.Jump:
case OpCode.In:
str += " :" + FormatAddress (pc + image[pc +
1]);
break;
case OpCode.Test:
str += " :" + FormatAddress (pc + image[pc +
1]);
str += ", :" + FormatAddress (pc + image[pc +
2]);
break;
case OpCode.Repeat: case OpCode.FastRepeat:
str += " :" + FormatAddress (pc + image[pc +
1]);
str += " (" + image[pc + 2] + ", ";
if (image[pc + 3] == 0xffff)
str += "Inf";
else
str += image[pc + 3];
str += ")";
break;
}
return str;
}
// private static members
private static string ReadString (ushort[] image, int pc) {
int len = image[pc];
char[] chars = new char[len];
for (int i = 0; i < len; ++ i)
chars[i] = (char)image[pc + i + 1];
return new string (chars);
}
private static string FormatAddress (int pc) {
return pc.ToString ("x4");
}
private static string FormatSet (ushort[] image, int pc) {
int lo = image[pc ++];
int hi = (image[pc ++] << 4) - 1;
string str = "[";
bool hot = false;
char a = (char)0, b;
for (int i = 0; i <= hi; ++ i) {
bool m = (image[pc + (i >> 4)] & (1 << (i &
0xf))) != 0;
if (m & !hot) { //
start of range
a = (char)(lo + i);
hot = true;
}
else if (hot & (!m || i == hi)) { // end
of range
b = (char)(lo + i - 1);
str += FormatChar (a);
if (b != a)
str += "-" + FormatChar (b);
hot = false;
}
}
str += "]";
return str;
}
private static string FormatChar (char c) {
if (c == '-' || c == ']')
return "\\" + c;
if (Char.IsLetterOrDigit (c) || Char.IsSymbol (c))
return c.ToString ();
if (Char.IsControl (c)) {
return "^" + (char)('@' + c);
}
return "\\u" + ((int)c).ToString ("x4");
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: interpreter.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
using System.Diagnostics;
using System.Globalization;
namespace System.Text.RegularExpressions {
class Interpreter : IMachine {
public Interpreter (ushort[] program) {
this.program = program;
this.qs = null;
// process info block
Debug.Assert ((OpCode)program[0] == OpCode.Info,
"Regex", "Cant' find info block");
this.group_count = program[1] + 1;
this.match_min = program[2];
this.match_max = program[3];
// setup
this.program_start = 4;
this.groups = new int [group_count];
}
// IMachine implementation
public Match Scan (Regex regex, string text, int start, int
end) {
this.text = text;
this.text_end = end;
this.scan_ptr = start;
if (Eval (Mode.Match, ref scan_ptr, program_start))
return GenerateMatch (regex);
return Match.Empty;
}
// private methods
private void Reset () {
ResetGroups ();
fast = repeat = null;
}
private bool Eval (Mode mode, ref int ref_ptr, int pc) {
int ptr = ref_ptr;
Begin:
for (;;) {
ushort word = program[pc];
OpCode op = (OpCode)(word & 0x00ff);
OpFlags flags = (OpFlags)(word & 0xff00);
switch (op) {
case OpCode.Anchor: {
int skip = program[pc + 1];
int anch_offset = program[pc + 2];
int anch_ptr = ptr + anch_offset;
int anch_end = text_end - match_min +
anch_offset; // maximum anchor position
// the general case for an anchoring
expression is at the bottom, however we
// do some checks for the common cases
before to save processing time. the current
// optimizer only outputs three types
of anchoring expressions: fixed position,
// fixed substring, and no anchor.
OpCode anch_op = (OpCode)(program[pc +
3] & 0x00ff);
if (anch_op == OpCode.Position && skip
== 6) { // position anchor
// Anchor
// Position
// True
switch ((Position)program[pc +
4]) {
case Position.StartOfString:
if (anch_ptr == 0) {
ptr = 0;
if (TryMatch
(ref ptr, pc + skip))
goto
Pass;
}
break;
case Position.StartOfLine:
if (anch_ptr == 0) {
ptr = 0;
if (TryMatch
(ref ptr, pc + skip))
goto
Pass;
++ anch_ptr;
}
while (anch_ptr <=
anch_end) {
if
(text[anch_ptr - 1] == '\n') {
ptr =
anch_ptr - anch_offset;
if
(TryMatch (ref ptr, pc + skip))
goto Pass;
}
++ anch_ptr;
}
break;
case Position.StartOfScan:
if (anch_ptr ==
scan_ptr) {
ptr = scan_ptr
- anch_offset;
if (TryMatch
(ref ptr, pc + skip))
goto
Pass;
}
break;
default:
// FIXME
break;
}
}
else if (qs != null ||
(anch_op == OpCode.String &&
skip == 6 + program[pc + 4])) { // substring anchor
// Anchor
// String
// True
if (qs == null) {
bool ignore =
((OpFlags)program[pc + 3] & OpFlags.IgnoreCase) != 0;
string substring =
GetString (pc + 3);
qs = new QuickSearch
(substring, ignore);
}
while (anch_ptr <= anch_end) {
anch_ptr = qs.Search
(text, anch_ptr, anch_end);
if (anch_ptr < 0)
break;
ptr = anch_ptr -
anch_offset;
if (TryMatch (ref ptr,
pc + skip))
goto Pass;
++ anch_ptr;
}
}
else if (anch_op == OpCode.True) {
// no anchor
// Anchor
// True
while (anch_ptr <= anch_end) {
ptr = anch_ptr;
if (TryMatch (ref ptr,
pc + skip))
goto Pass;
++ anch_ptr;
}
}
else {
// general case
// Anchor
// <expr>
// True
while (anch_ptr <= anch_end) {
ptr = anch_ptr;
if (Eval (Mode.Match,
ref ptr, pc + 3)) {
// anchor
expression passed: try real expression at the correct offset
ptr = anch_ptr
- anch_offset;
if (TryMatch
(ref ptr, pc + skip))
goto
Pass;
}
++ anch_ptr;
}
}
goto Fail;
}
case OpCode.False: {
goto Fail;
}
case OpCode.True: {
goto Pass;
}
case OpCode.Position: {
if (!IsPosition ((Position)program[pc +
1], ptr))
goto Fail;
pc += 2;
break;
}
case OpCode.String: {
bool reverse = (flags &
OpFlags.RightToLeft) != 0;
bool ignore = (flags &
OpFlags.IgnoreCase) != 0;
int len = program[pc + 1];
if (reverse) {
ptr -= len;
if (ptr < 0)
goto Fail;
}
else if (ptr + len > text_end)
goto Fail;
pc += 2;
for (int i = 0; i < len; ++ i) {
char c = text[ptr + i];
if (ignore)
c = Char.ToLower (c);
if (c != (char)program[pc ++])
goto Fail;
}
if (!reverse)
ptr += len;
break;
}
case OpCode.Reference: {
bool reverse = (flags &
OpFlags.RightToLeft) != 0;
bool ignore = (flags &
OpFlags.IgnoreCase) != 0;
int m = GetLastDefined (program [pc +
1]);
if (m < 0)
goto Fail;
int str = marks [m].Index;
int len = marks [m].Length;
if (reverse) {
ptr -= len;
if (ptr < 0)
goto Fail;
}
else if (ptr + len > text_end)
goto Fail;
pc += 2;
for (int i = 0; i < len; ++ i) {
if (ignore) {
if (Char.ToLower
(text[ptr + i]) != Char.ToLower (text[str + i]))
goto Fail;
}
else {
if (text[ptr + i] !=
text[str + i])
goto Fail;
}
}
if (!reverse)
ptr += len;
break;
}
case OpCode.Character: case OpCode.Category:
case OpCode.Range: case OpCode.Set: {
if (!EvalChar (mode, ref ptr, ref pc,
false))
goto Fail;
break;
}
case OpCode.In: {
int target = pc + program[pc + 1];
pc += 2;
if (!EvalChar (mode, ref ptr, ref pc,
true))
goto Fail;
pc = target;
break;
}
case OpCode.Open: {
Open (program[pc + 1], ptr);
pc += 2;
break;
}
case OpCode.Close: {
Close (program[pc + 1], ptr);
pc += 2;
break;
}
case OpCode.Balance: {
Balance (program[pc + 1], program[pc +
2], ptr);
break;
}
case OpCode.IfDefined: {
int m = GetLastDefined (program [pc +
2]);
if (m < 0)
pc += program[pc + 1];
else
pc += 3;
break;
}
case OpCode.Sub: {
if (!Eval (Mode.Match, ref ptr, pc + 2))
goto Fail;
pc += program[pc + 1];
break;
}
case OpCode.Test: {
int cp = Checkpoint ();
int test_ptr = ptr;
if (Eval (Mode.Match, ref test_ptr, pc
+ 3))
pc += program[pc + 1];
else {
Backtrack (cp);
pc += program[pc + 2];
}
break;
}
case OpCode.Branch: {
OpCode branch_op;
do {
int cp = Checkpoint ();
if (Eval (Mode.Match, ref ptr,
pc + 2))
goto Pass;
Backtrack (cp);
pc += program[pc + 1];
branch_op =
(OpCode)(program[pc] & 0xff);
} while (branch_op != OpCode.False);
goto Fail;
}
case OpCode.Jump: {
pc += program[pc + 1];
break;
}
case OpCode.Repeat: {
this.repeat = new RepeatContext (
this.repeat,
// previous context
program[pc + 2],
// minimum
program[pc + 3],
// maximum
(flags & OpFlags.Lazy) != 0,
// lazy
pc + 4
// subexpression
);
if (Eval (Mode.Match, ref ptr, pc +
program[pc + 1]))
goto Pass;
else {
this.repeat =
this.repeat.Previous;
goto Fail;
}
}
case OpCode.Until: {
RepeatContext current = this.repeat;
int start = current.Start;
if (!current.IsMinimum) {
++ current.Count;
current.Start = ptr;
if (Eval (Mode.Match, ref ptr,
repeat.Expression))
goto Pass;
current.Start = start;
-- current.Count;
goto Fail;
}
if (ptr == current.Start) {
// degenerate match ... match
tail or fail
this.repeat = current.Previous;
if (Eval (Mode.Match, ref ptr,
pc + 1))
goto Pass;
goto Fail;
}
if (current.IsLazy) {
// match tail first ...
this.repeat = current.Previous;
int cp = Checkpoint ();
if (Eval (Mode.Match, ref ptr,
pc + 1))
goto Pass;
Backtrack (cp);
// ... then match more
this.repeat = current;
if (!current.IsMaximum) {
++ current.Count;
current.Start = ptr;
if (Eval (Mode.Match,
ref ptr, current.Expression))
goto Pass;
current.Start = start;
-- current.Count;
goto Fail;
}
return false;
}
else {
// match more first ...
if (!current.IsMaximum) {
int cp = Checkpoint ();
++ current.Count;
current.Start = ptr;
if (Eval (Mode.Match,
ref ptr, current.Expression))
goto Pass;
current.Start = start;
-- current.Count;
Backtrack (cp);
}
// ... then match tail
this.repeat = current.Previous;
if (Eval (Mode.Match, ref ptr,
pc + 1))
goto Pass;
this.repeat = current;
goto Fail;
}
}
case OpCode.FastRepeat: {
this.fast = new RepeatContext (
fast,
program[pc + 2],
// minimum
program[pc + 3],
// maximum
(flags & OpFlags.Lazy) != 0,
// lazy
pc + 4
// subexpression
);
fast.Start = ptr;
int cp = Checkpoint ();
pc += program[pc + 1]; // tail
expression
ushort tail_word = program[pc];
int c1, c2; //
first character of tail operator
int coff; // 0 or
-1 depending on direction
OpCode tail_op = (OpCode)(tail_word &
0xff);
if (tail_op == OpCode.Character ||
tail_op == OpCode.String) {
OpFlags tail_flags =
(OpFlags)(tail_word & 0xff00);
if (tail_op == OpCode.String)
c1 = program[pc + 2];
// first char of string
else
c1 = program[pc + 1];
// character
if ((tail_flags &
OpFlags.IgnoreCase) != 0)
c2 = Char.ToUpper
((char)c1); // ignore case
else
c2 = c1;
if ((tail_flags &
OpFlags.RightToLeft) != 0)
coff = -1;
// reverse
else
coff = 0;
}
else {
c1 = c2 = -1;
coff = 0;
}
if (fast.IsLazy) {
if (!fast.IsMinimum && !Eval
(Mode.Count, ref ptr, fast.Expression)) {
//Console.WriteLine
("lazy fast: failed mininum.");
fast = fast.Previous;
goto Fail;
}
while (true) {
int p = ptr + coff;
if ((c1 < 0 || (p >= 0
&& p < text_end && (c1 == text[p] || c2 == text[p]))) &&
Eval (Mode.Match,
ref ptr, pc))
break;
if (fast.IsMaximum) {
//Console.WriteLine ("lazy fast: failed with maximum.");
fast =
fast.Previous;
goto Fail;
}
Backtrack (cp);
if (!Eval (Mode.Count,
ref ptr, fast.Expression)) {
//Console.WriteLine ("lazy fast: no more.");
fast =
fast.Previous;
goto Fail;
}
}
fast = fast.Previous;
goto Pass;
}
else {
if (!Eval (Mode.Count, ref ptr,
fast.Expression)) {
fast = fast.Previous;
goto Fail;
}
int width;
if (fast.Count > 0)
width = (ptr -
fast.Start) / fast.Count;
else
width = 0;
while (true) {
int p = ptr + coff;
if ((c1 < 0 || (p >= 0
&& p < text_end && (c1 == text[p] || c2 == text[p]))) &&
Eval (Mode.Match,
ref ptr, pc))
break;
-- fast.Count;
if (!fast.IsMinimum) {
fast =
fast.Previous;
goto Fail;
}
ptr -= width;
Backtrack (cp);
}
fast = fast.Previous;
goto Pass;
}
}
case OpCode.Info: {
Debug.Assert (false, "Regex", "Info
block found in pattern");
goto Fail;
}
}
}
Pass:
ref_ptr = ptr;
switch (mode) {
case Mode.Match:
return true;
case Mode.Count: {
++ fast.Count;
if (fast.IsMaximum || (fast.IsLazy &&
fast.IsMinimum))
return true;
pc = fast.Expression;
goto Begin;
}
}
Fail:
switch (mode) {
case Mode.Match:
return false;
case Mode.Count: {
if (!fast.IsLazy && fast.IsMinimum)
return true;
ref_ptr = fast.Start;
return false;
}
}
return false;
}
private bool EvalChar (Mode mode, ref int ptr, ref int pc, bool
multi) {
bool consumed = false;
char c = '\0';
bool negate;
bool ignore;
do {
ushort word = program[pc];
OpCode op = (OpCode)(word & 0x00ff);
OpFlags flags = (OpFlags)(word & 0xff00);
++ pc;
ignore = (flags & OpFlags.IgnoreCase) != 0;
// consume character: the direction of an In
construct is
// determined by the direction of its first op
if (!consumed) {
if ((flags & OpFlags.RightToLeft) != 0)
{
if (ptr <= 0)
return false;
c = text[-- ptr];
}
else {
if (ptr >= text_end)
return false;
c = text[ptr ++];
}
if (ignore)
c = Char.ToLower (c);
consumed = true;
}
// negate flag
negate = (flags & OpFlags.Negate) != 0;
// execute op
switch (op) {
case OpCode.True:
return true;
case OpCode.False:
return false;
case OpCode.Character: {
if (c == (char)program[pc ++])
return !negate;
break;
}
case OpCode.Category: {
if (CategoryUtils.IsCategory
((Category)program[pc ++], c))
return !negate;
break;
}
case OpCode.Range: {
int lo = (char)program[pc ++];
int hi = (char)program[pc ++];
if (lo <= c && c <= hi)
return !negate;
break;
}
case OpCode.Set: {
int lo = (char)program[pc ++];
int len = (char)program[pc ++];
int bits = pc;
pc += len;
int i = (int)c - lo;
if (i < 0 || i >= len << 4)
break;
if ((program[bits + (i >> 4)] & (1 <<
(i & 0xf))) != 0)
return !negate;
break;
}
}
} while (multi);
return negate;
}
private bool TryMatch (ref int ref_ptr, int pc) {
Reset ();
int ptr = ref_ptr;
marks [groups [0]].Start = ptr;
if (Eval (Mode.Match, ref ptr, pc)) {
marks [groups [0]].End = ptr;
ref_ptr = ptr;
return true;
}
return false;
}
private bool IsPosition (Position pos, int ptr) {
switch (pos) {
case Position.Start: case Position.StartOfString:
return ptr == 0;
case Position.StartOfLine:
return ptr == 0 || text[ptr - 1] == '\n';
case Position.StartOfScan:
return ptr == scan_ptr;
case Position.End:
return ptr == text_end ||
(ptr == text_end - 1 && text[ptr] ==
'\n');
case Position.EndOfLine:
return ptr == text_end || text[ptr] == '\n';
case Position.EndOfString:
return ptr == text_end;
case Position.Boundary:
if (text_end == 0)
return false;
if (ptr == 0)
return IsWordChar (text[ptr]);
else if (ptr == text_end)
return IsWordChar (text[ptr - 1]);
else
return IsWordChar (text[ptr]) !=
IsWordChar (text[ptr - 1]);
case Position.NonBoundary:
if (text_end == 0)
return false;
if (ptr == 0)
return !IsWordChar (text[ptr]);
else if (ptr == text_end)
return !IsWordChar (text[ptr - 1]);
else
return IsWordChar (text[ptr]) ==
IsWordChar (text[ptr - 1]);
default:
return false;
}
}
private bool IsWordChar (char c) {
return CategoryUtils.IsCategory (Category.Word, c);
}
private string GetString (int pc) {
int len = program[pc + 1];
int str = pc + 2;
char[] cs = new char[len];
for (int i = 0; i < len; ++ i)
cs[i] = (char)program[str ++];
return new string (cs);
}
// capture management
private void Open (int gid, int ptr) {
int m = groups [gid];
if (m < mark_start || marks [m].IsDefined) {
m = CreateMark (m);
groups [gid] = m;
}
marks [m].Start = ptr;
}
private void Close (int gid, int ptr) {
marks [groups [gid]].End = ptr;
}
private void Balance (int gid, int balance_gid, int ptr) {
int b = groups [balance_gid];
Debug.Assert (marks [b].IsDefined, "Regex", "Balancing
group not closed");
if (gid > 0) {
Open (gid, marks [b].Index + marks [b].Length);
Close (gid, ptr);
}
groups [balance_gid] = marks [b].Previous;
}
private int Checkpoint () {
mark_start = mark_end;
return mark_start;
}
private void Backtrack (int cp) {
Debug.Assert (cp > mark_start, "Regex", "Attempt to
backtrack forwards");
for (int i = 0; i < groups.Length; ++ i) {
int m = groups [i];
while (cp <= m)
m = marks [m].Previous;
groups [i] = m;
}
}
private void ResetGroups () {
int n = groups.Length;
if (marks == null)
marks = new Mark [n * 10];
for (int i = 0; i < n; ++ i) {
groups [i] = i;
marks [i].Start = -1;
marks [i].End = -1;
marks [i].Previous = -1;
}
mark_start = 0;
mark_end = n;
}
private int GetLastDefined (int gid) {
int m = groups [gid];
while (m >= 0 && !marks [m].IsDefined)
m = marks [m].Previous;
return m;
}
private int CreateMark (int previous) {
if (mark_end == marks.Length) {
Mark [] dest = new Mark [marks.Length * 2];
marks.CopyTo (dest, 0);
marks = dest;
}
int m = mark_end ++;
marks [m].Start = marks [m].End = -1;
marks [m].Previous = previous;
return m;
}
private Match GenerateMatch (Regex regex) {
int[][] grps = new int[groups.Length][];
ArrayList caps = new ArrayList ();
for (int gid = 0; gid < groups.Length; ++ gid) {
caps.Clear ();
for (int m = groups[gid]; m >= 0; m =
marks[m].Previous) {
if (!marks[m].IsDefined)
continue;
caps.Add (marks[m].Index);
caps.Add (marks[m].Length);
}
grps[gid] = (int[])caps.ToArray (typeof (int));
}
return new Match (regex, this, text, text_end, grps);
}
// interpreter attributes
private ushort[] program; // regex program
private int program_start; // first instruction
after info block
private string text; // input text
private int text_end; // end of input text
(last character + 1)
private int group_count; // number of capturing
groups
private int match_min, match_max; // match width
information
private QuickSearch qs; // fast substring
matcher
// match state
private int scan_ptr; // start of scan
private RepeatContext repeat; // current repeat
context
private RepeatContext fast; // fast repeat context
private Mark[] marks = null; // mark stack
private int mark_start; // start of current
checkpoint
private int mark_end; // end of
checkpoint/next free mark
private int[] groups; // current group
definitions
// private classes
private struct Mark {
public int Start, End;
public int Previous;
public bool IsDefined {
get { return Start >= 0 && End >= 0; }
}
public int Index {
get { return Start < End ? Start : End; }
}
public int Length {
get { return Start < End ? End - Start : Start
- End; }
}
}
private class RepeatContext {
public RepeatContext (RepeatContext previous, int min,
int max, bool lazy, int expr_pc) {
this.previous = previous;
this.min = min;
this.max = max;
this.lazy = lazy;
this.expr_pc = expr_pc;
this.start = -1;
this.count = 0;
}
public int Count {
get { return count; }
set { count = value; }
}
public int Start {
get { return start; }
set { start = value; }
}
public bool IsMinimum {
get { return min <= count; }
}
public bool IsMaximum {
get { return max <= count; }
}
public bool IsLazy {
get { return lazy; }
}
public int Expression {
get { return expr_pc; }
}
public RepeatContext Previous {
get { return previous; }
}
private int start;
private int min, max;
private bool lazy;
private int expr_pc;
private RepeatContext previous;
private int count;
}
private enum Mode {
Search,
Match,
Count
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: interval.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions {
struct Interval : IComparable {
public int low;
public int high;
public bool contiguous;
public static Interval Empty {
get {
Interval i;
i.low = 0;
i.high = i.low - 1;
i.contiguous = true;
return i;
}
}
public static Interval Entire {
get { return new Interval (Int32.MinValue,
Int32.MaxValue); }
}
public Interval (int low, int high) {
if (low > high) {
int t = low;
low = high;
high = t;
}
this.low = low;
this.high = high;
this.contiguous = true;
}
public bool IsDiscontiguous {
get { return !contiguous; }
}
public bool IsSingleton {
get { return contiguous && low == high; }
}
public bool IsRange {
get { return !IsSingleton && !IsEmpty; }
}
public bool IsEmpty {
get { return low > high; }
}
public int Size {
get {
if (IsEmpty)
return 0;
return high - low + 1;
}
}
public bool IsDisjoint (Interval i) {
if (IsEmpty || i.IsEmpty)
return true;
return !(low <= i.high && i.low <= high);
}
public bool IsAdjacent (Interval i) {
if (IsEmpty || i.IsEmpty)
return false;
return low == i.high + 1 || high == i.low - 1;
}
public bool Contains (Interval i) {
if (!IsEmpty && i.IsEmpty)
return true;
if (IsEmpty)
return false;
return low <= i.low && i.high <= high;
}
public bool Contains (int i) {
return low <= i && i <= high;
}
public void Merge (Interval i) {
if (i.IsEmpty)
return;
if (IsEmpty) {
this.low = i.low;
this.high = i.high;
}
if (i.low < low)
low = i.low;
if (i.high > high)
high = i.high;
}
public void Intersect (Interval i) {
if (IsDisjoint (i)) {
low = 0;
high = low - 1;
return;
}
if (i.low > low)
low = i.low;
if (i.high > high)
high = i.high;
}
public int CompareTo (object o) {
return low - ((Interval)o).low;
}
public override string ToString () {
if (IsEmpty)
return "(EMPTY)";
else if (!contiguous)
return "{" + low + ", " + high + "}";
else if (IsSingleton)
return "(" + low + ")";
else
return "(" + low + ", " + high + ")";
}
}
class IntervalCollection : ICollection, IEnumerable {
public IntervalCollection () {
intervals = new ArrayList ();
}
public Interval this[int i] {
get { return (Interval)intervals[i]; }
set { intervals[i] = value; }
}
public void Add (Interval i) {
intervals.Add (i);
}
public void Clear () {
intervals.Clear ();
}
public void Sort () {
intervals.Sort ();
}
public void Normalize () {
intervals.Sort ();
int j = 0;
while (j < intervals.Count - 1) {
Interval a = (Interval)intervals[j];
Interval b = (Interval)intervals[j + 1];
if (!a.IsDisjoint (b) || a.IsAdjacent (b)) {
a.Merge (b);
intervals[j] = a;
intervals.RemoveAt (j + 1);
}
else
++ j;
}
}
public delegate double CostDelegate (Interval i);
public IntervalCollection GetMetaCollection (CostDelegate
cost_del) {
IntervalCollection meta = new IntervalCollection ();
Normalize ();
Optimize (0, Count - 1, meta, cost_del);
meta.intervals.Sort ();
return meta;
}
private void Optimize (int begin, int end, IntervalCollection
meta, CostDelegate cost_del) {
Interval set;
set.contiguous = false;
int best_set_begin = -1;
int best_set_end = -1;
double best_set_cost = 0;
for (int i = begin; i <= end; ++ i) {
set.low = this[i].low;
double cost = 0.0;
for (int j = i; j <= end; ++ j) {
set.high = this[j].high;
cost += cost_del (this[j]);
double set_cost = cost_del (set);
if (set_cost < cost && cost >
best_set_cost) {
best_set_begin = i;
best_set_end = j;
best_set_cost = cost;
}
}
}
if (best_set_begin < 0) {
// didn't find an optimal set: add original
members
for (int i = begin; i <= end; ++ i)
meta.Add (this[i]);
}
else {
// found set: add it ...
set.low = this[best_set_begin].low;
set.high = this[best_set_end].high;
meta.Add (set);
// ... and optimize to the left and right
if (best_set_begin > begin)
Optimize (begin, best_set_begin - 1,
meta, cost_del);
if (best_set_end < end)
Optimize (best_set_end + 1, end, meta,
cost_del);
}
}
// ICollection implementation
public int Count {
get { return intervals.Count; }
}
public bool IsSynchronized {
get { return false; }
}
public object SyncRoot {
get { return intervals; }
}
public void CopyTo (Array array, int index) {
foreach (Interval i in intervals) {
if (index > array.Length)
break;
array.SetValue (i, index ++);
}
}
// IEnumerator implementation
public IEnumerator GetEnumerator () {
return new Enumerator (intervals);
}
private class Enumerator : IEnumerator {
public Enumerator (IList list) {
this.list = list;
Reset ();
}
public object Current {
get {
if (ptr >= list.Count)
throw new
InvalidOperationException ();
return list[ptr];
}
}
public bool MoveNext () {
if (ptr > list.Count)
throw new InvalidOperationException ();
return ++ ptr < list.Count;
}
public void Reset () {
ptr = -1;
}
private IList list;
private int ptr;
}
// private fields
private ArrayList intervals;
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: match.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
namespace System.Text.RegularExpressions {
[Serializable]
public class Capture {
public int Index {
get { return index; }
}
public int Length {
get { return length; }
}
public string Value {
get { return text.Substring (index, length); }
}
public override string ToString () {
return Value;
}
// internal members
internal Capture (string text) : this (text, 0, 0) { }
internal Capture (string text, int index, int length) {
this.text = text;
this.index = index;
this.length = length;
}
internal string Text {
get { return text; }
}
// private
internal int index, length;
internal string text;
}
[Serializable]
public class Group : Capture {
public static Group Synchronized (Group inner) {
return inner; // is this enough?
}
public CaptureCollection Captures {
get { return captures; }
}
public bool Success {
get { return success; }
}
// internal
internal Group (string text, int[] caps) : base (text) {
this.captures = new CaptureCollection ();
if (caps == null || caps.Length == 0) {
this.success = false;
return;
}
this.success = true;
this.index = caps[0];
this.length = caps[1];
captures.Add (this);
for (int i = 2; i < caps.Length; i += 2)
captures.Add (new Capture (text, caps[i],
caps[i + 1]));
captures.Reverse ();
}
private bool success;
private CaptureCollection captures;
}
[Serializable]
public class Match : Group {
public static Match Empty {
get { return empty; }
}
public static Match Synchronized (Match inner) {
return inner; // FIXME need to sync on machine access
}
public virtual GroupCollection Groups {
get { return groups; }
}
public Match NextMatch () {
if (this == Empty)
return Empty;
int scan_ptr = regex.RightToLeft ? Index : Index +
Length;
// next match after an empty match: make sure scan ptr
makes progress
if (Length == 0)
scan_ptr += regex.RightToLeft ? -1 : +1;
return machine.Scan (regex, Text, scan_ptr,
text_length);
}
public virtual string Result (string replacement) {
return ReplacementEvaluator.Evaluate (replacement,
this);
}
// internal
internal Match () : base (null, null) {
this.regex = null;
this.machine = null;
this.text_length = 0;
this.groups = new GroupCollection ();
groups.Add (this);
}
internal Match (Regex regex, IMachine machine, string text, int
text_length, int[][] grps) :
base (text, grps[0])
{
this.regex = regex;
this.machine = machine;
this.text_length = text_length;
this.groups = new GroupCollection ();
groups.Add (this);
for (int i = 1; i < grps.Length; ++ i)
groups.Add (new Group (text, grps[i]));
}
internal Regex Regex {
get { return regex; }
}
// private
private Regex regex;
private IMachine machine;
private int text_length;
private GroupCollection groups;
private static Match empty = new Match ();
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: parser.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
using System.Globalization;
namespace System.Text.RegularExpressions.Syntax {
class Parser {
public static int ParseDecimal (string str, ref int ptr) {
return ParseNumber (str, ref ptr, 10, 1,
Int32.MaxValue);
}
[...1081 lines suppressed...]
private static bool IsECMAScript (RegexOptions options) {
return (options & RegexOptions.ECMAScript) != 0;
}
// exception creation
private ArgumentException NewParseException (string msg) {
msg = "parsing \"" + pattern + "\" - " + msg;
return new ArgumentException (msg, pattern);
}
private string pattern;
private int ptr;
private ArrayList caps;
private Hashtable refs;
private int num_groups;
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: quicksearch.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
namespace System.Text.RegularExpressions {
// TODO use simple test for single character strings
class QuickSearch {
// simplified boyer-moore for fast substring matching
public QuickSearch (string str, bool ignore) {
this.str = str;
this.len = str.Length;
this.ignore = ignore;
Setup ();
}
public string String {
get { return str; }
}
public int Length {
get { return len; }
}
public bool IgnoreCase {
get { return ignore; }
}
public int Search (string text, int start, int end) {
if (end > text.Length - len)
end = text.Length - len;
int ptr = start;
if (!ignore) {
while (ptr <= end) {
int i = len - 1;
while (str[i] == text[ptr + i]) {
if (-- i < 0)
return ptr;
}
if (ptr < end)
ptr += shift[text[ptr + len]];
else
break;
}
}
else {
// ignore case: same as above, but we convert
text
// to lower case before doing the string compare
while (ptr <= end) {
int i = len - 1;
while (str[i] == Char.ToLower (text[ptr
+ i])) {
if (-- i < 0)
return ptr;
}
if (ptr < end)
ptr += shift[text[ptr + len]];
else
break;
}
}
return -1;
}
// private
private void Setup () {
if (ignore)
str = str.ToLower ();
// this is a 64k entry shift table. that's 128kb per
pattern!
// is it worth compressing this by only storing shifts
within
// a (lo, hi) character range? for most substrings this
would
// be around 50 bytes...
shift = new int[0x1000];
for (int i = 0; i < 0x1000; ++ i)
shift[i] = len + 1;
for (int i = 0; i < len; ++ i) {
char c = str[i];
shift[c] = len - i;
if (ignore)
shift[Char.ToUpper (c)] = len - i;
}
}
private string str;
private int len;
private bool ignore;
private int[] shift;
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: regex.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Text;
using System.Collections;
using System.Reflection;
using System.Reflection.Emit;
using System.Runtime.Serialization;
using RegularExpression =
System.Text.RegularExpressions.Syntax.RegularExpression;
using Parser = System.Text.RegularExpressions.Syntax.Parser;
namespace System.Text.RegularExpressions {
public delegate string MatchEvaluator (Match match);
[Flags]
public enum RegexOptions {
None = 0x000,
IgnoreCase = 0x001,
Multiline = 0x002,
ExplicitCapture = 0x004,
Compiled = 0x008,
Singleline = 0x010,
IgnorePatternWhitespace = 0x020,
RightToLeft = 0x040,
ECMAScript = 0x100
}
[Serializable]
public class Regex : ISerializable {
public static void CompileToAssembly
(RegexCompilationInfo[] regexes, AssemblyName aname)
{
throw new Exception ("Not implemented.");
}
public static void CompileToAssembly
(RegexCompilationInfo[] regexes, AssemblyName aname,
CustomAttributeBuilder[] attribs)
{
throw new Exception ("Not implemented.");
}
public static void CompileToAssembly
(RegexCompilationInfo[] regexes, AssemblyName aname,
CustomAttributeBuilder[] attribs, string resourceFile)
{
throw new Exception ("Not implemented.");
}
public static string Escape (string str) {
return Parser.Escape (str);
}
public static string Unescape (string str) {
return Parser.Unescape (str);
}
public static bool IsMatch (string input, string pattern) {
return IsMatch (input, pattern, RegexOptions.None);
}
public static bool IsMatch (string input, string pattern,
RegexOptions options) {
Regex re = new Regex (pattern, options);
return re.IsMatch (input);
}
public static Match Match (string input, string pattern) {
return Regex.Match (input, pattern, RegexOptions.None);
}
public static Match Match (string input, string pattern,
RegexOptions options) {
Regex re = new Regex (pattern, options);
return re.Match (input);
}
public static MatchCollection Matches (string input, string
pattern) {
return Matches (input, pattern, RegexOptions.None);
}
public static MatchCollection Matches (string input, string
pattern, RegexOptions options) {
Regex re = new Regex (pattern, options);
return re.Matches (input);
}
public static string Replace
(string input, string pattern, MatchEvaluator evaluator)
{
return Regex.Replace (input, pattern, evaluator,
RegexOptions.None);
}
public static string Replace
(string input, string pattern, MatchEvaluator evaluator,
RegexOptions options)
{
Regex re = new Regex (pattern, options);
return re.Replace (input, evaluator);
}
public static string Replace
(string input, string pattern, string replacement)
{
return Regex.Replace (input, pattern, replacement,
RegexOptions.None);
}
public static string Replace
(string input, string pattern, string replacement,
RegexOptions options)
{
Regex re = new Regex (pattern, options);
return re.Replace (input, replacement);
}
public static string[] Split (string input, string pattern) {
return Regex.Split (input, pattern, RegexOptions.None);
}
public static string[] Split (string input, string pattern,
RegexOptions options) {
Regex re = new Regex (input, options);
return re.Split (input);
}
// private
private static FactoryCache cache = new FactoryCache (200);
// TODO put some meaningful number here
// constructors
protected Regex () {
// XXX what's this constructor for?
}
public Regex (string pattern) : this (pattern,
RegexOptions.None) {
}
public Regex (string pattern, RegexOptions options) {
this.pattern = pattern;
this.options = options;
this.factory = cache.Lookup (pattern, options);
if (this.factory == null) {
// parse and install group mapping
Parser psr = new Parser ();
RegularExpression re =
psr.ParseRegularExpression (pattern, options);
this.group_count = re.GroupCount;
this.mapping = psr.GetMapping ();
// compile
ICompiler cmp;
//if ((options & RegexOptions.Compiled) != 0)
// throw new Exception ("Not
implemented.");
//cmp = new CILCompiler ();
//else
cmp = new PatternCompiler ();
re.Compile (cmp, RightToLeft);
// install machine factory and add to pattern
cache
this.factory = cmp.GetMachineFactory ();
this.factory.Mapping = mapping;
cache.Add (pattern, options, this.factory);
} else {
this.group_count = this.factory.GroupCount;
this.mapping = this.factory.Mapping;
}
}
protected Regex (SerializationInfo info, StreamingContext
context) :
this (info.GetString ("pattern"),
(RegexOptions) info.GetValue ("options", typeof
(RegexOptions))) {
}
// public instance properties
public RegexOptions Options {
get { return options; }
}
public bool RightToLeft {
get { return (options & RegexOptions.RightToLeft) != 0;
}
}
// public instance methods
public string[] GetGroupNames () {
string[] names = new string[mapping.Count];
mapping.Keys.CopyTo (names, 0);
return names;
}
public int[] GetGroupNumbers () {
int[] numbers = new int[mapping.Count];
mapping.Values.CopyTo (numbers, 0);
return numbers;
}
public string GroupNameFromNumber (int i) {
if (i >= group_count)
return "";
foreach (string name in mapping.Keys) {
if ((int)mapping[name] == i)
return name;
}
return "";
}
public int GroupNumberFromName (string name) {
if (mapping.Contains (name))
return (int)mapping[name];
return -1;
}
// match methods
public bool IsMatch (string input) {
return IsMatch (input, 0);
}
public bool IsMatch (string input, int startat) {
return Match (input, startat).Success;
}
public Match Match (string input) {
return Match (input, 0);
}
public Match Match (string input, int startat) {
return CreateMachine ().Scan (this, input, startat,
input.Length);
}
public Match Match (string input, int startat, int length) {
return CreateMachine ().Scan (this, input, startat,
startat + length);
}
public MatchCollection Matches (string input) {
return Matches (input, 0);
}
public MatchCollection Matches (string input, int startat) {
MatchCollection ms = new MatchCollection ();
Match m = Match (input, startat);
while (m.Success) {
ms.Add (m);
m = m.NextMatch ();
}
return ms;
}
// replace methods
public string Replace (string input, MatchEvaluator evaluator) {
return Replace (input, evaluator, Int32.MaxValue, 0);
}
public string Replace (string input, MatchEvaluator evaluator,
int count) {
return Replace (input, evaluator, count, 0);
}
public string Replace (string input, MatchEvaluator evaluator,
int count, int startat)
{
StringBuilder result = new StringBuilder ();
int ptr = startat;
Match m = Match (input, startat);
while (m.Success && count -- > 0) {
result.Append (input.Substring (ptr, m.Index -
ptr));
result.Append (evaluator (m));
ptr = m.Index + m.Length;
m = m.NextMatch ();
}
result.Append (input.Substring (ptr));
return result.ToString ();
}
public string Replace (string input, string replacement) {
return Replace (input, replacement, Int32.MaxValue, 0);
}
public string Replace (string input, string replacement, int
count) {
return Replace (input, replacement, count, 0);
}
public string Replace (string input, string replacement, int
count, int startat) {
ReplacementEvaluator ev = new ReplacementEvaluator
(this, replacement);
return Replace (input, new MatchEvaluator
(ev.Evaluate), count, startat);
}
// split methods
public string[] Split (string input) {
return Split (input, Int32.MaxValue, 0);
}
public string[] Split (string input, int count) {
return Split (input, count, 0);
}
public string[] Split (string input, int count, int startat) {
ArrayList splits = new ArrayList ();
if (count == 0)
count = Int32.MaxValue;
int ptr = startat;
while (count -- > 0) {
Match m = Match (input, ptr);
if (!m.Success)
break;
splits.Add (input.Substring (ptr, m.Index -
ptr));
ptr = m.Index + m.Length;
}
if (count > 0)
splits.Add (input.Substring (ptr));
string[] result = new string[splits.Count];
splits.CopyTo (result);
return result;
}
// object methods
public override string ToString () {
return pattern;
}
// ISerializable interface
public virtual void GetObjectData (SerializationInfo info,
StreamingContext context) {
info.AddValue ("pattern", this.ToString (), typeof
(string));
info.AddValue ("options", this.Options, typeof
(RegexOptions));
}
// internal
internal int GroupCount {
get { return group_count; }
}
// private
private IMachine CreateMachine () {
return factory.NewInstance ();
}
protected internal string pattern;
private RegexOptions options;
private IMachineFactory factory;
private IDictionary mapping;
private int group_count;
}
[Serializable]
public class RegexCompilationInfo {
public RegexCompilationInfo (string pattern, RegexOptions
options, string name, string full_namespace, bool is_public) {
this.pattern = pattern;
this.options = options;
this.name = name;
this.full_namespace = full_namespace;
this.is_public = is_public;
}
public bool IsPublic {
get { return is_public; }
set { is_public = value; }
}
public string Name {
get { return name; }
set { name = value; }
}
public string Namespace {
get { return full_namespace; }
set { full_namespace = value; }
}
public RegexOptions Options {
get { return options; }
set { options = value; }
}
public string Pattern {
get { return pattern; }
set { pattern = value; }
}
// private
private string pattern, name, full_namespace;
private RegexOptions options;
private bool is_public;
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: RegexRunner.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
namespace System.Text.RegularExpressions {
public abstract class RegexRunner {
// constructor
protected internal RegexRunner () {
throw new NotImplementedException ("RegexRunner is not
supported by this library");
}
// protected abstract
protected abstract bool FindFirstChar ();
protected abstract void Go ();
protected abstract void InitTrackCount ();
// protected methods
protected void Capture (int capnum, int start, int end) {
}
protected static bool CharInSet (char ch, string set, string
category) {
return false;
}
protected void Crawl (int i) {
}
protected int CrawlPos () {
return 0;
}
protected void DoubleCrawl () {
}
protected void DoubleStack () {
}
protected void DoubleTrack () {
}
protected void EnsureStorage () {
}
protected bool IsBoundary (int index, int startpos, int endpos)
{
return false;
}
protected bool IsECMABoundary (int index, int startpos, int
endpos) {
return false;
}
protected bool IsMatched (int cap) {
return false;
}
protected int MatchIndex (int cap) {
return 0;
}
protected int MatchLength (int cap) {
return 0;
}
protected int PopCrawl () {
return 0;
}
protected void TransferCapture (int capnum, int uncapnum, int
start, int end) {
}
protected void Uncapture () {
}
// internal
protected internal Match Scan (Regex regex, string text, int
textbeg, int textend, int textstart, int prevlen, bool quick) {
return null;
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: RegexRunnerFactory.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
namespace System.Text.RegularExpressions {
public abstract class RegexRunnerFactory {
protected RegexRunnerFactory () {
throw new NotImplementedException ("RegexRunnerFactory
is not supported by this library.");
}
protected internal abstract RegexRunner CreateInstance ();
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: replace.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Text;
using System.Collections;
using Parser = System.Text.RegularExpressions.Syntax.Parser;
namespace System.Text.RegularExpressions {
class ReplacementEvaluator {
public static string Evaluate (string replacement, Match match)
{
ReplacementEvaluator ev = new ReplacementEvaluator
(match.Regex, replacement);
return ev.Evaluate (match);
}
public ReplacementEvaluator (Regex regex, string replacement) {
this.regex = regex;
terms = new ArrayList ();
Compile (replacement);
}
public string Evaluate (Match match) {
StringBuilder result = new StringBuilder ();
foreach (Term term in terms)
result.Append (term.GetResult (match));
return result.ToString ();
}
// private
private void Compile (string replacement) {
replacement = Parser.Unescape (replacement);
string literal = "";
int ptr = 0;
char c;
Term term = null;
while (ptr < replacement.Length) {
c = replacement[ptr ++];
if (c == '$') {
if (replacement[ptr] == '$') {
++ ptr;
break;
}
term = CompileTerm (replacement, ref
ptr);
}
if (term != null) {
term.Literal = literal;
terms.Add (term);
term = null;
literal = "";
}
else
literal += c;
}
if (term == null && literal.Length > 0) {
terms.Add (new Term (literal));
}
}
private Term CompileTerm (string str, ref int ptr) {
char c = str[ptr];
if (Char.IsDigit (c)) { // numbered group
int n = Parser.ParseDecimal (str, ref ptr);
if (n < 0 || n > regex.GroupCount)
throw new ArgumentException ("Bad group
number.");
return new Term (TermOp.Match, n);
}
++ ptr;
switch (c) {
case '{': { // named group
string name = Parser.ParseName (str, ref ptr);
if (str[ptr ++] != '}' || name == null)
throw new ArgumentException ("Bad group
name.");
int n = regex.GroupNumberFromName (name);
if (n < 0)
throw new ArgumentException ("Bad group
name.");
return new Term (TermOp.Match, n);
}
case '&': // entire match
return new Term (TermOp.Match, 0);
case '`': // text before match
return new Term (TermOp.PreMatch, 0);
case '\'': // text after match
return new Term (TermOp.PostMatch, 0);
case '+': // last group
return new Term (TermOp.Match, regex.GroupCount
- 1);
case '_': // entire text
return new Term (TermOp.All, 0);
default:
throw new ArgumentException ("Bad replacement
pattern.");
}
}
private Regex regex;
private ArrayList terms;
private enum TermOp {
None, // no action
Match, // input within group
PreMatch, // input before group
PostMatch, // input after group
All // entire input
}
private class Term {
public Term (TermOp op, int arg) {
this.op = op;
this.arg = arg;
this.literal = "";
}
public Term (string literal) {
this.op = TermOp.None;
this.arg = 0;
this.literal = literal;
}
public string Literal {
set { literal = value; }
}
public string GetResult (Match match) {
Group group = match.Groups[arg];
switch (op) {
case TermOp.None:
return literal;
case TermOp.Match:
return literal + group.Value;
case TermOp.PreMatch:
return literal + group.Text.Substring
(0, group.Index);
case TermOp.PostMatch:
return literal + group.Text.Substring
(group.Index + group.Length);
case TermOp.All:
return literal + group.Text;
}
return "";
}
public TermOp op; // term type
public int arg; // group argument
public string literal; // literal to prepend
public override string ToString () {
return op.ToString () + "(" + arg + ") " +
literal;
}
}
}
}
--- NEW FILE ---
//
// assembly: System
// namespace: System.Text.RegularExpressions
// file: syntax.cs
//
// author: Dan Lewis (address@hidden)
// (c) 2002
using System;
using System.Collections;
namespace System.Text.RegularExpressions.Syntax {
// collection classes
class ExpressionCollection : CollectionBase {
public void Add (Expression e) {
List.Add (e);
}
public Expression this[int i] {
get { return (Expression)List[i]; }
set { List[i] = value; }
}
protected override void OnValidate (object o) {
// allow null elements
}
}
// abstract classes
abstract class Expression {
public abstract void Compile (ICompiler cmp, bool reverse);
public abstract void GetWidth (out int min, out int max);
public int GetFixedWidth () {
int min, max;
GetWidth (out min, out max);
if (min == max)
return min;
return -1;
}
public virtual AnchorInfo GetAnchorInfo () {
return new AnchorInfo (this, GetFixedWidth ());
}
public virtual bool IsComplex () {
return true;
}
}
// composite expressions
abstract class CompositeExpression : Expression {
public CompositeExpression () {
expressions = new ExpressionCollection ();
}
protected ExpressionCollection Expressions {
get { return expressions; }
}
protected void GetWidth (out int min, out int max, int count) {
min = Int32.MaxValue;
max = 0;
bool empty = true;
for (int i = 0; i < count; ++ i) {
Expression e = Expressions[i];
if (e == null)
continue;
empty = false;
int a, b;
e.GetWidth (out a, out b);
if (a < min) min = a;
if (b > max) max = b;
}
if (empty)
min = max = 0;
}
private ExpressionCollection expressions;
}
// groups
class Group : CompositeExpression {
public Group () {
}
public Expression Expression {
get { return Expressions[0]; }
set { Expressions[0] = value; }
}
public void AppendExpression (Expression e) {
Expressions.Add (e);
}
public override void Compile (ICompiler cmp, bool reverse) {
int count = Expressions.Count;
for (int i = 0; i < count; ++ i) {
Expression e;
if (reverse)
e = Expressions[count - i - 1];
else
e = Expressions[i];
e.Compile (cmp, reverse);
}
}
public override void GetWidth (out int min, out int max) {
min = 0;
max = 0;
foreach (Expression e in Expressions) {
int a, b;
e.GetWidth (out a, out b);
min += a;
if (max == Int32.MaxValue || b ==
Int32.MaxValue)
max = Int32.MaxValue;
else
max += b;
}
}
public override AnchorInfo GetAnchorInfo () {
int ptr;
int width = GetFixedWidth ();
ArrayList infos = new ArrayList ();
IntervalCollection segments = new IntervalCollection ();
// accumulate segments
ptr = 0;
foreach (Expression e in Expressions) {
AnchorInfo info = e.GetAnchorInfo ();
infos.Add (info);
if (info.IsPosition)
return new AnchorInfo (this, ptr +
info.Offset, width, info.Position);
if (info.IsSubstring)
segments.Add (info.GetInterval (ptr));
if (info.IsUnknownWidth)
break;
ptr += info.Width;
}
// normalize and find the longest segment
segments.Normalize ();
Interval longest = Interval.Empty;
foreach (Interval segment in segments) {
if (segment.Size > longest.Size)
longest = segment;
}
// now chain the substrings that made this segment
together
if (!longest.IsEmpty) {
string str = "";
bool ignore = false;
ptr = 0;
foreach (AnchorInfo info in infos) {
if (info.IsSubstring &&
longest.Contains (info.GetInterval (ptr))) {
str += info.Substring; // TODO
mark subexpressions
ignore |= info.IgnoreCase;
}
if (info.IsUnknownWidth)
break;
ptr += info.Width;
}
return new AnchorInfo (this, longest.low,
width, str, ignore);
}
return new AnchorInfo (this, width);
}
public override bool IsComplex () {
bool comp = false;
foreach (Expression e in Expressions) {
comp |= e.IsComplex ();
}
return comp | GetFixedWidth () <= 0;
}
}
class RegularExpression : Group {
public RegularExpression () {
group_count = 0;
}
public int GroupCount {
get { return group_count; }
set { group_count = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
// info block
int min, max;
GetWidth (out min, out max);
cmp.EmitInfo (group_count, min, max);
// anchoring expression
AnchorInfo info = GetAnchorInfo ();
if (reverse)
info = new AnchorInfo (this, GetFixedWidth ());
// FIXME
LinkRef pattern = cmp.NewLink ();
cmp.EmitAnchor (info.Offset, pattern);
if (info.IsPosition)
cmp.EmitPosition (info.Position);
else if (info.IsSubstring)
cmp.EmitString (info.Substring,
info.IgnoreCase, reverse);
cmp.EmitTrue ();
// pattern
cmp.ResolveLink (pattern);
base.Compile (cmp, reverse);
cmp.EmitTrue ();
}
private int group_count;
}
class CapturingGroup : Group {
public CapturingGroup () {
this.gid = 0;
this.name = null;
}
public int Number {
get { return gid; }
set { gid = value; }
}
public string Name {
get { return name; }
set { name = value; }
}
public bool IsNamed {
get { return name != null; }
}
public override void Compile (ICompiler cmp, bool reverse) {
cmp.EmitOpen (gid);
base.Compile (cmp, reverse);
cmp.EmitClose (gid);
}
public override bool IsComplex () {
return true;
}
private int gid;
private string name;
}
class BalancingGroup : CapturingGroup {
public BalancingGroup () {
this.balance = null;
}
public CapturingGroup Balance {
get { return balance; }
set { balance = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
// can't invoke Group.Compile from here :(
// so I'll just repeat the code
int count = Expressions.Count;
for (int i = 0; i < count; ++ i) {
Expression e;
if (reverse)
e = Expressions[count - i - 1];
else
e = Expressions[i];
e.Compile (cmp, reverse);
}
cmp.EmitBalance (this.Number, balance.Number);
}
private CapturingGroup balance;
}
class NonBacktrackingGroup : Group {
public NonBacktrackingGroup () {
}
public override void Compile (ICompiler cmp, bool reverse) {
LinkRef tail = cmp.NewLink ();
cmp.EmitSub (tail);
base.Compile (cmp, reverse);
cmp.EmitTrue ();
cmp.ResolveLink (tail);
}
public override bool IsComplex () {
return true;
}
}
// repetition
class Repetition : CompositeExpression {
public Repetition (int min, int max, bool lazy) {
Expressions.Add (null);
this.min = min;
this.max = max;
this.lazy = lazy;
}
public Expression Expression {
get { return Expressions[0]; }
set { Expressions[0] = value; }
}
public int Minimum {
get { return min; }
set { min = value; }
}
public int Maximum {
get { return max; }
set { max = value; }
}
public bool Lazy {
get { return lazy; }
set { lazy = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
if (Expression.IsComplex ()) {
LinkRef until = cmp.NewLink ();
cmp.EmitRepeat (min, max, lazy, until);
Expression.Compile (cmp, reverse);
cmp.EmitUntil (until);
}
else {
LinkRef tail = cmp.NewLink ();
cmp.EmitFastRepeat (min, max, lazy, tail);
Expression.Compile (cmp, reverse);
cmp.EmitTrue ();
cmp.ResolveLink (tail);
}
}
public override void GetWidth (out int min, out int max) {
Expression.GetWidth (out min, out max);
min = min * this.min;
if (max == Int32.MaxValue || this.max == 0xffff)
max = Int32.MaxValue;
else
max = max * this.max;
}
public override AnchorInfo GetAnchorInfo () {
int width = GetFixedWidth ();
if (Minimum == 0)
return new AnchorInfo (this, width);
AnchorInfo info = Expression.GetAnchorInfo ();
if (info.IsPosition)
return new AnchorInfo (this, info.Offset,
width, info.Position);
if (info.IsSubstring) {
if (info.IsComplete) {
string str = "";
for (int i = 0; i < Minimum; ++ i)
str += info.Substring;
return new AnchorInfo (this, 0, width,
str, info.IgnoreCase);
}
return new AnchorInfo (this, info.Offset,
width, info.Substring, info.IgnoreCase);
}
return new AnchorInfo (this, width);
}
private int min, max;
private bool lazy;
}
// assertions
abstract class Assertion : CompositeExpression {
public Assertion () {
Expressions.Add (null); // true expression
Expressions.Add (null); // false expression
}
public Expression TrueExpression {
get { return Expressions[0]; }
set { Expressions[0] = value; }
}
public Expression FalseExpression {
get { return Expressions[1]; }
set { Expressions[1] = value; }
}
public override void GetWidth (out int min, out int max) {
GetWidth (out min, out max, 2);
if (TrueExpression == null || FalseExpression == null)
min = 0;
}
}
class CaptureAssertion : Assertion {
public CaptureAssertion () {
}
public CapturingGroup CapturingGroup {
get { return group; }
set { group = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
int gid = group.Number;
LinkRef tail = cmp.NewLink ();
if (FalseExpression == null) {
// IfDefined :1
// <yes_exp>
// 1: <tail>
cmp.EmitIfDefined (gid, tail);
TrueExpression.Compile (cmp, reverse);
}
else {
// IfDefined :1
// <yes_expr>
// Jump :2
// 1: <no_expr>
// 2: <tail>
LinkRef false_expr = cmp.NewLink ();
cmp.EmitIfDefined (gid, false_expr);
TrueExpression.Compile (cmp, reverse);
cmp.EmitJump (tail);
cmp.ResolveLink (false_expr);
FalseExpression.Compile (cmp, reverse);
}
cmp.ResolveLink (tail);
}
public override bool IsComplex () {
bool comp = false;
if (TrueExpression != null)
comp |= TrueExpression.IsComplex ();
if (FalseExpression != null)
comp |= FalseExpression.IsComplex ();
return comp | GetFixedWidth () <= 0;
}
private CapturingGroup group;
}
class ExpressionAssertion : Assertion {
public ExpressionAssertion () {
Expressions.Add (null); // test expression
}
public bool Reverse {
get { return reverse; }
set { reverse = value; }
}
public bool Negate {
get { return negate; }
set { negate = value; }
}
public Expression TestExpression {
get { return Expressions[2]; }
set { Expressions[2] = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
LinkRef true_expr = cmp.NewLink ();
LinkRef false_expr = cmp.NewLink ();
// test op: positive / negative
if (!negate)
cmp.EmitTest (true_expr, false_expr);
else
cmp.EmitTest (false_expr, true_expr);
// test expression: lookahead / lookbehind
TestExpression.Compile (cmp, reverse ^ this.reverse);
cmp.EmitTrue ();
// target expressions
if (TrueExpression == null) { // (?=
...)
// Test :1, :2
// <test_expr>
// :2 False
// :1 <tail>
cmp.ResolveLink (false_expr);
cmp.EmitFalse ();
cmp.ResolveLink (true_expr);
}
else {
cmp.ResolveLink (true_expr);
TrueExpression.Compile (cmp, reverse);
if (FalseExpression == null) { //
(?(...) ...)
// Test :1, :2
// <test_expr>
// :1 <yes_expr>
// :2 <tail>
cmp.ResolveLink (false_expr);
}
else { //
(?(...) ... | ...)
// Test :1, :2
// <test_expr>
// :1 <yes_expr>
// Jump :3
// :2 <no_expr>
// :3 <tail>
LinkRef tail = cmp.NewLink ();
cmp.EmitJump (tail);
cmp.ResolveLink (false_expr);
FalseExpression.Compile (cmp, reverse);
cmp.ResolveLink (tail);
}
}
}
private bool reverse, negate;
}
// alternation
class Alternation : CompositeExpression {
public Alternation () {
}
public ExpressionCollection Alternatives {
get { return Expressions; }
}
public void AddAlternative (Expression e) {
Alternatives.Add (e);
}
public override void Compile (ICompiler cmp, bool reverse) {
LinkRef next = cmp.NewLink ();
LinkRef tail = cmp.NewLink ();
foreach (Expression e in Alternatives) {
cmp.EmitBranch (next);
e.Compile (cmp, reverse);
cmp.EmitJump (tail);
cmp.ResolveLink (next);
}
cmp.EmitFalse ();
cmp.ResolveLink (tail);
}
public override void GetWidth (out int min, out int max) {
GetWidth (out min, out max, Alternatives.Count);
}
public override bool IsComplex () {
bool comp = false;
foreach (Expression e in Alternatives) {
comp |= e.IsComplex ();
}
return comp | GetFixedWidth () <= 0;
}
}
// terminal expressions
class Literal : Expression {
public Literal (string str, bool ignore) {
this.str = str;
this.ignore = ignore;
}
public string String {
get { return str; }
set { str = value; }
}
public bool IgnoreCase {
get { return ignore; }
set { ignore = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
if (str.Length == 0)
return;
if (str.Length == 1)
cmp.EmitCharacter (str[0], false, ignore,
reverse);
else
cmp.EmitString (str, ignore, reverse);
}
public override void GetWidth (out int min, out int max) {
min = max = str.Length;
}
public override AnchorInfo GetAnchorInfo () {
return new AnchorInfo (this, 0, str.Length, str,
ignore);
}
public override bool IsComplex () {
return false;
}
private string str;
private bool ignore;
}
class PositionAssertion : Expression {
public PositionAssertion (Position pos) {
this.pos = pos;
}
public Position Position {
get { return pos; }
set { pos = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
cmp.EmitPosition (pos);
}
public override void GetWidth (out int min, out int max) {
min = max = 0;
}
public override bool IsComplex () {
return false;
}
public override AnchorInfo GetAnchorInfo () {
switch (pos) {
case Position.StartOfString: case Position.StartOfLine:
case Position.StartOfScan:
return new AnchorInfo (this, 0, 0, pos);
default:
return new AnchorInfo (this, 0);
}
}
private Position pos;
}
class Reference : Expression {
public Reference (bool ignore) {
this.ignore = ignore;
}
public CapturingGroup CapturingGroup {
get { return group; }
set { group = value; }
}
public bool IgnoreCase {
get { return ignore; }
set { ignore = value; }
}
public override void Compile (ICompiler cmp, bool reverse) {
cmp.EmitReference (group.Number, ignore, reverse);
}
public override void GetWidth (out int min, out int max) {
//group.GetWidth (out min, out max);
// TODO set width to referenced group for non-cyclical
references
min = 0;
max = Int32.MaxValue;
}
public override bool IsComplex () {
return true; // FIXME incorporate cyclic check
}
private CapturingGroup group;
private bool ignore;
}
class CharacterClass : Expression {
public CharacterClass (bool negate, bool ignore) {
this.negate = negate;
this.ignore = ignore;
intervals = new IntervalCollection ();
// initialize pos/neg category arrays
Array cat_values = Enum.GetValues (typeof (Category));
int cat_size = (int)(Category)cat_values.GetValue
(cat_values.Length - 1) + 1;
pos_cats = new bool[cat_size];
neg_cats = new bool[cat_size];
for (int i = 0; i < cat_size; ++ i) {
pos_cats[i] = false;
neg_cats[i] = false;
}
}
public CharacterClass (Category cat, bool negate) : this
(false, false) {
this.AddCategory (cat, negate);
}
public bool Negate {
get { return negate; }
set { negate = value; }
}
public bool IgnoreCase {
get { return ignore; }
set { ignore = value; }
}
public void AddCategory (Category cat, bool negate) {
int n = (int)cat;
if (negate) {
if (pos_cats[n])
pos_cats[n] = false;
neg_cats[n] = true;
}
else {
if (neg_cats[n])
neg_cats[n] = false;
pos_cats[n] = true;
}
}
public void AddCharacter (char c) {
intervals.Add (new Interval (c, c));
}
public void AddRange (char lo, char hi) {
intervals.Add (new Interval (lo, hi));
}
public override void Compile (ICompiler cmp, bool reverse) {
// create the meta-collection
IntervalCollection meta =
intervals.GetMetaCollection (new
IntervalCollection.CostDelegate (GetIntervalCost));
// count ops
int count = meta.Count;
for (int i = 0; i < pos_cats.Length; ++ i) {
if (pos_cats[i]) ++ count;
if (neg_cats[i]) ++ count;
}
if (count == 0)
return;
// emit in op for |meta| > 1
LinkRef tail = cmp.NewLink ();
if (count > 1)
cmp.EmitIn (tail);
// emit categories
for (int i = 0; i < pos_cats.Length; ++ i) {
if (pos_cats[i])
cmp.EmitCategory ((Category)i, negate,
reverse);
else if (neg_cats[i])
cmp.EmitCategory ((Category)i, !negate,
reverse);
}
// emit character/range/sets from meta-collection
foreach (Interval a in meta) {
if (a.IsDiscontiguous) {
// Set
BitArray bits = new BitArray (a.Size);
foreach (Interval b in intervals) {
if (a.Contains (b)) {
for (int i = b.low; i
<= b.high; ++ i)
bits[i - a.low]
= true;
}
}
cmp.EmitSet ((char)a.low, bits, negate,
ignore, reverse);
}
else if (a.IsSingleton)
// Character
cmp.EmitCharacter ((char)a.low, negate,
ignore, reverse);
else
// Range
cmp.EmitRange ((char)a.low,
(char)a.high, negate, ignore, reverse);
}
// finish up
if (count > 1) {
if (negate)
cmp.EmitTrue ();
else
cmp.EmitFalse ();
cmp.ResolveLink (tail);
}
}
public override void GetWidth (out int min, out int max) {
min = max = 1;
}
public override bool IsComplex () {
return false;
}
// private
private static double GetIntervalCost (Interval i) {
// use op length as cost metric (=> optimize for space)
if (i.IsDiscontiguous)
return 3 + ((i.Size + 0xf) >> 4);
// Set
else if (i.IsSingleton)
return 2;
// Character
else
return 3;
// Range
}
private bool negate, ignore;
private bool[] pos_cats, neg_cats;
private IntervalCollection intervals;
}
class AnchorInfo {
private Expression expr;
private Position pos;
private int offset;
private string str;
private int width;
private bool ignore;
public AnchorInfo (Expression expr, int width) {
this.expr = expr;
this.offset = 0;
this.width = width;
this.str = null;
this.ignore = false;
this.pos = Position.Any;
}
public AnchorInfo (Expression expr, int offset, int width,
string str, bool ignore) {
this.expr = expr;
this.offset = offset;
this.width = width;
this.str = ignore ? str.ToLower () : str;
this.ignore = ignore;
this.pos = Position.Any;
}
public AnchorInfo (Expression expr, int offset, int width,
Position pos) {
this.expr = expr;
this.offset = offset;
this.width = width;
this.pos = pos;
this.str = null;
this.ignore = false;
}
public Expression Expression {
get { return expr; }
}
public int Offset {
get { return offset; }
}
public int Width {
get { return width; }
}
public int Length {
get { return (str != null) ? str.Length : 0; }
}
public bool IsUnknownWidth {
get { return width < 0; }
}
public bool IsComplete {
get { return Length == Width; }
}
public string Substring {
get { return str; }
}
public bool IgnoreCase {
get { return ignore; }
}
public Position Position {
get { return pos; }
}
public bool IsSubstring {
get { return str != null; }
}
public bool IsPosition {
get { return pos != Position.Any; }
}
public Interval GetInterval () {
return GetInterval (0);
}
public Interval GetInterval (int start) {
if (!IsSubstring)
return Interval.Empty;
return new Interval (start + Offset, start + Offset +
Length - 1);
}
}
}
--- NEW FILE ---
2002-11-12 Jackson Harper <address@hidden>
* arch.cs compiler.cs regex.cs: Added mapping attribute to
MachineFactories
2002-11-06 Gonzalo Paniagua Javier <address@hidden>
* parser.cs: detect illegal \ at end of pattern. Fixes 31334.
2002-10-25 Gonzalo Paniagua Javier <address@hidden>
* parser.cs: applied fix from Tim Haynes (address@hidden) to
solve bug #32807. Also modified GetMapping to return the same as MS.
2002-08-28 Juli Mallett <address@hidden>
* arch.cs, compiler.cs: Give the interpreter machine a property
for the retrieval of the group count.
* regex.cs: Use the new GroupCount property of the factory to
initialise the current group count, and restructure code to compile
the pattern only the first time it is needed (essentially backing
out the previous revision of regex.cs, to use the new code.)
2002-08-14 Cesar Octavio Lopez Nataren <address@hidden>
* regex.cs: Added the ctr for ISerializable implementation and
implemented the GetObjectData function.
2002-07-30 Juli Mallett <address@hidden>
* regex.cs: Fixed bug where the expression would not be
re-evaluated for grouping purposes when factory caches were
used, resulting in no groups being recognised after one call
with a given pattern and no change in options.
2002-05-13 Dan Lewis <address@hidden>
* regex.cs: Fixed bug in split.
2002-05-08 Dan Lewis <address@hidden>
* interpreter.cs: Moved to an array-based stack representation
for faster captures.
* match.cs, collections.cs: Decoupled capture representation from
interpreter internals.
* cache.cs: Changed Key type from struct to class for speed.
2002-04-06 Dan Lewis <address@hidden>
* cache.cs: Object methods should be overridden with "override".
2002-04-04 Dan Lewis <address@hidden>
* RegexRunner.cs, RegexRunnerFactory.cs: MS support classes. Stubs
added for completeness.
* regex.cs, match.cs, collections.cs: Serializable attribute.
2002-04-04 Dan Lewis <address@hidden>
* regex.cs: Added static Matches and IsMatch methods.
2002-04-03 Dan Lewis <address@hidden>
* ChangeLog: Added changelog.
* cache.cs: Fixed bug in MRUList.Evict.
--- NEW FILE ---
TODO:
* Need to go through everything and square it with RightToLeft matching.
The support for this was built into an early version, and lots of things built
afterwards are not savvy about bi-directional matching. Things that spring to
mind: Regex match methods should start at 0 or text.Length depending on
direction. Do split and replace need changes? Match should be aware of its
direction (already applied some of this to NextMatch logic). The interpreter
needs to check left and right bounds. Anchoring and substring discovery need
to be reworked. RTL matches are going to have anchors on the right - ie $, \Z
and \z. This should be added to the anchor logic. QuickSearch needs to work in
reverse. There may be other stuff.... work through the code.
* Add ECMAScript support to the parser. For example, [.\w\s\d] map to ECMA
categories instead of canonical ones [DONE]. There's different behaviour on
backreference/octal disambiguation. Find out what the runtime behavioural
difference is for cyclic backreferences eg (?(1)abc\1) - this is only briefly
mentioned in the spec. I couldn't find much on this in the ECMAScript
specification either.
* Octal/backreference parsing needs a big fix. The rules are ridiculously
complex.
* Add a check in QuickSearch for single character substrings. This is likely to
be a common case. There's no need to go through a shift table. Also, have a
look at just computing a relevant subset of the shift table and using an
(offset, size) pair to help test inclusion. Characters not in the table get
the default len + 1 shift.
* Improve the perl test suite. Run under MS runtime to generate checksums for
each trial. Checksums should incorporate: all captures (index, length) for all
groups; names of explicit capturing groups, and the numbers they map to. Any
other state? RegexTrial.Execute() will then compare result and checksum.
* The pattern (?(1?)a|b). It should fail: Perl fails, the MS implementation
fails, but I pass. The documentation says that the construct (?(X)...) can be
processed in two ways. If X is a valid group number, or a valid group name,
then the expression becomes a capture conditional - the (...) part is
executed only if X has been captured. If X is not a group number or name, then
it is treated as a positive lookahead., and (...) is only executed if the
lookahead succeeds. My code does the latter, but on further investigation it
appears that both Perl and MS fail to recognize an expression assertion if the
first character of the assertion is a number - which instead suggests a
capture conditional. The exception raised is something like "invalid group
number". I get the feeling the my behaviour seems more correct, but it's not
consistent with the other implementations, so it should probably be changed.
--- NEW FILE ---
This code was originally sourced from the Mono C# library. It is
available under the X11 license
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Dotgnu-pnet-commits] CVS: pnetlib/System/Text/RegularExpressions arch.cs,NONE,1.1 cache.cs,NONE,1.1 category.cs,NONE,1.1 collections.cs,NONE,1.1 compiler.cs,NONE,1.1 debug.cs,NONE,1.1 interpreter.cs,NONE,1.1 interval.cs,NONE,1.1 match.cs,NONE,1.1 parser.cs,NONE,1.1 quicksearch.cs,NONE,1.1 regex.cs,NONE,1.1 RegexRunner.cs,NONE,1.1 RegexRunnerFactory.cs,NONE,1.1 replace.cs,NONE,1.1 syntax.cs,NONE,1.1 ChangeLog,NONE,1.1 notes.txt,NONE,1.1README,NONE,1.1,
Gopal.V <address@hidden> <=