cellio: (Default)
[personal profile] cellio
Aha. It *is*, theoretically, a simple matter of programming.

A while ago, I described an API-validation problem. One of the tools I have available is the generated Javadoc for the candidate API. And Javadoc, as it turns out, does something reasonable if a method signature references a package it can't find: it simply spells out the name of the package, rather than providing its usual link. So, for example, if Foo is in the API and Bar is not, you'd get something like this:

public void doSomething(Foo, com.mayaviz.something.Bar)


where the underlining is really a link to the Javadoc for Foo.

Ok, fine and dandy; I have the beginnings of a grep-based solution. I can grep all the Javadoc looking for "com.mayaviz" -- but it's not that simple, because the legitimate links (like Foo) will have that in their URLs. So I can either post-process the grep output to throw out links, or avoid generating that output in the first place.

So what I *really* need is a very specialized parser: eat stuff until you find a <pre>, then until </pre> do this: eat everything in an <a>...</a>, and other than that if you find "com.mayaviz" spit out a report. Iterate until done.

Now I need to learn enough Perl to actually write the sucker... (I am not a Perl hacker. Not by a long shot.)

Hm...

Date: 2001-12-13 02:33 am (UTC)
goljerp: Photo of the moon Callisto (Default)
From: [personal profile] goljerp
What if you did something a bit more brute-force? Take your favorite text editor with regular expression search & replace (I'd use BBEdit Lite, but I think JEdit would also work) and do a recursive search & replace on the generated javadocs which does something like:
search: <.*>(.*)</.*>
replace: \1
If I've done this correctly (it is 5:30am), it would strip out all html tags. No, wait, this won't handle nested tags. OK, a bit less brute forceish:
search: <a.*>(.*)</a.*> - which would strip out your links. Then you can do your grep on the modified files. BBEdit lets you save searches, so you don't have to remember what exactly you got working every time the APIs change...

I'm sure you could write something in perl to do what you want, but if you're not a perl hacker, the above might be faster. I'm having delusions that at one point I could've done what you wanted with just a combination of awk and grep, but like I said it is 5:30am...

Expand Cut Tags

No cut tags