Entry tags:
(no subject)
Aha. It *is*, theoretically, a simple matter of programming.
A while ago, I described an API-validation problem. One of the tools I have available is the generated Javadoc for the candidate API. And Javadoc, as it turns out, does something reasonable if a method signature references a package it can't find: it simply spells out the name of the package, rather than providing its usual link. So, for example, if Foo is in the API and Bar is not, you'd get something like this:
where the underlining is really a link to the Javadoc for Foo.
Ok, fine and dandy; I have the beginnings of a grep-based solution. I can grep all the Javadoc looking for "com.mayaviz" -- but it's not that simple, because the legitimate links (like Foo) will have that in their URLs. So I can either post-process the grep output to throw out links, or avoid generating that output in the first place.
So what I *really* need is a very specialized parser: eat stuff until you find a <pre>, then until </pre> do this: eat everything in an <a>...</a>, and other than that if you find "com.mayaviz" spit out a report. Iterate until done.
Now I need to learn enough Perl to actually write the sucker... (I am not a Perl hacker. Not by a long shot.)
A while ago, I described an API-validation problem. One of the tools I have available is the generated Javadoc for the candidate API. And Javadoc, as it turns out, does something reasonable if a method signature references a package it can't find: it simply spells out the name of the package, rather than providing its usual link. So, for example, if Foo is in the API and Bar is not, you'd get something like this:
public void doSomething(Foo, com.mayaviz.something.Bar)
where the underlining is really a link to the Javadoc for Foo.
Ok, fine and dandy; I have the beginnings of a grep-based solution. I can grep all the Javadoc looking for "com.mayaviz" -- but it's not that simple, because the legitimate links (like Foo) will have that in their URLs. So I can either post-process the grep output to throw out links, or avoid generating that output in the first place.
So what I *really* need is a very specialized parser: eat stuff until you find a <pre>, then until </pre> do this: eat everything in an <a>...</a>, and other than that if you find "com.mayaviz" spit out a report. Iterate until done.
Now I need to learn enough Perl to actually write the sucker... (I am not a Perl hacker. Not by a long shot.)
Hm...
search:
<.*>(.*)</.*>replace:
\1If I've done this correctly (it is 5:30am), it would strip out all html tags. No, wait, this won't handle nested tags. OK, a bit less brute forceish:
search:
<a.*>(.*)</a.*>- which would strip out your links. Then you can do your grep on the modified files. BBEdit lets you save searches, so you don't have to remember what exactly you got working every time the APIs change...I'm sure you could write something in perl to do what you want, but if you're not a perl hacker, the above might be faster. I'm having delusions that at one point I could've done what you wanted with just a combination of awk and grep, but like I said it is 5:30am...
Re: Hm...
And stripping out the links as a pre-processing step is a very good idea -- thanks.
Re: Hm...
I'm sure Emacs can do that. I mean, heck, there's a version of tetris for Emacs. I was, as I said, thinking of something much simpler, though: use a text editor's built in search. For example, I have JEdit in front of me. It's free, uses Java (so since you're on a platform that can deal with Javadocs, it can deal with JEdit, I'm guessing), and has a search and replace which has an option of using regular expressions, and of searching a directory, and subdirectories of that. And using file filters, so it only tries to look at the .html files. So point it at the top, and let it churn its way through. BBEdit Lite, a free, Mac-only text editor, has a very similar feature. I think BBEdit has a slight edge, because JEdit will probably try to have each changed file open as a buffer... this could be a problem depending upon how big your javadoc generation is. (BBEdit, on the other hand, will save the files for you as you go along...) Since this is just the output of Javadoc you're working with, the potential for mass destruction is minimal... but do make sure you're pointing at the right directory, with the right regular expression, before you let all hell break loose.