cp-tools-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Cp-tools-discuss] Re: Gjdoc now runs on full classpath tree


From: Mark Wielaard
Subject: Re: [Cp-tools-discuss] Re: Gjdoc now runs on full classpath tree
Date: 11 May 2002 18:52:16 +0200

Hi,

On Fri, 2002-05-10 at 05:03, Julian Scheid wrote:
> Mark Wielaard wrote:
>  >>(BTW Javadoc 1.4 has some new clever approach that doesn't
>  >>use BreakIterator but some hand-crafted logic to deal with
>  >>nifty things like "J. R. Tolkien" -- but BreakIterator
>  >>should work quiet well for most cases.)
>  >
>  > Maybe it will be a good thing to implement this since there are clearly
>  > some buggy BreakIterators out there :{
> 
> Right, even the Sun 1.3.1 BreakIterator doesn't work - sometimes it
> includes the period at the end of the sentence, sometimes not, though
> I was able to work around that.
> 
> I've added the following stub to gnu.classpath.tools.gjdoc.DocImpl.java
> If anyone feels like it, he can implement it. I currently don't ;)

I looked at this
http://java.sun.com/j2se/1.4/docs/tooldocs/javadoc/whatsnew-1.4.html#-breakiterator
but it seems that the old way of doing it was with some simple heuristic
(is there a . followed by whitespace or a br/p tag) and the new way
actually uses the BreakIterator. But that new way is not actually turned
on by default yet, and not everybody seems to be happy with the new
behaviour (which is not very suitable for HTML markup). See for example
http://developer.java.sun.com/developer/bugParade/bugs/4165985.html

So what I have done is implemented the old heuristic (attached). You can
see why this is more appropriate then the BreakIterator by looking for
example at the Class description on the java.beans package page of
java.beans.Customizer. Or at the method summary for
java.io.FilePermission.hashCode(). Which both have a description that
goes something like: "This is the first sentence.<p> With some lengthy
explanation of the actual algorithm used." The BreakIterator doesn't
think the .<p> does actually end the sentence since it isn't made to
parse HTML marked up text.

What do you think. Should I commit this?

Cheers,

Mark

P.S. I have also fixed the bug in Classpath/libgcj BreakIterator, but
that won't make it into gcj 3.1 (maybe 3.1.1).
Index: src/gnu/classpath/tools/gjdoc/DocImpl.java
===================================================================
RCS file: /cvsroot/cp-tools/gjdoc/src/gnu/classpath/tools/gjdoc/DocImpl.java,v
retrieving revision 1.7
diff -u -r1.7 DocImpl.java
--- src/gnu/classpath/tools/gjdoc/DocImpl.java  10 May 2002 04:06:13 -0000      
1.7
+++ src/gnu/classpath/tools/gjdoc/DocImpl.java  11 May 2002 16:43:53 -0000
@@ -180,13 +180,49 @@
     *  @param startIndex  index in <code>text</code> at which to start
     *  @param endIndex  index in <code>text</code> at which to stop
     *
-    *  @returns the index of the character following the end-of-sentence 
+    *  @return the index of the character following the end-of-sentence 
     *    marker, <code>endIndex</code> if no end-of-sentence
     *    marker could be found, or -1 if not implemented.
     */
-   private static int findEndOfSentence(char[] text, int startIndex, int 
endIndex) {
+   private static int findEndOfSentence(char[] text, int startIndex,
+                                       int endIndex)
+   {
+      while (startIndex < endIndex)
+       {
+         if (text[startIndex] == '.'
+           && (startIndex+1 == endIndex
+               || Character.isWhitespace(text[startIndex+1])
+               || isHTMLBreakTag(text, startIndex+1, endIndex)))
+           return startIndex;
+
+           startIndex++;
+       }
+      return endIndex-1;
+   }
 
-      return -1;
+   /**
+    * Returns true is the text from start to end begins with a 'p' or 'br' tag.
+    */
+   private static boolean isHTMLBreakTag(char[] text, int start, int end)
+   {
+     return
+       (text[start] == '<'
+        &&
+         (
+          (
+           start+2 < end
+           && (text[start+1] == 'p' || text[start+1] == 'P')
+           && (text[start+2] == '>' || Character.isWhitespace(text[start+2]))
+          )
+         ||
+          (
+           start+3 < end
+           && (text[start+1] == 'b' || text[start+1] == 'B')
+           && (text[start+2] == 'r' || text[start+2] == 'R')
+           && (text[start+3] == '>' || Character.isWhitespace(text[start+3]))
+          )
+         )
+       );
    }
 
    public static Map parseCommentTags(char[] comment, int startIndex, int 
endIndex, 
@@ -214,16 +250,13 @@
            boundary.next();
            firstSentenceEnd = boundary.current();
 
-           // FIXME
-           if (firstSentenceEnd < comment.length && '.' == 
comment[firstSentenceEnd]) {
-              ++ firstSentenceEnd;
-           }
         }
 
-        String fs = new String(comment, rawDocStart, 
firstSentenceEnd-rawDocStart);
-        if (fs.indexOf("Timer")>=0)
-           System.err.println("firstSentence='"+fs+"'");
-
+        // Always include period at end of sentence if there is one.
+        if (firstSentenceEnd < comment.length
+                        && '.' == comment[firstSentenceEnd]) {
+           ++ firstSentenceEnd;
+        }
       }
 
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]