« Antisemitism? Huh? | Main | Rally for Israel »

KON & BAL's Puzzle Page: PB&J (Project Builder & Java)

KON & BAL's Puzzle Page: PB&J (Project Builder & Java)
by Avi Drissman

See if you can solve this programming puzzle, presented in the form of a dialog between Avi Drissman (KON) and himself (BAL). The dialog gives clues to help you. Keep guessing until you're done; your score is the number to the left of the clue that gave you the correct answer. Even if you never run into the particular problems being solved here, you'll learn some valuable debugging techniques that will help you solve your own programming conundrums. And you'll also learn interesting Macintosh trivia.

KON Hey, BAL! It's been a while since I've run into you. What's up?

BAL Not much. Just doing some consulting work to keep busy. You still employed?

KON Yeah. I've been busy carbonizing our company's server app. I'm just having some problems with Java.

BAL Java? I thought you were carbonizing it. You were using Java on the classic Mac OS?

KON No. It's a long story, but the upshot of it is that the AIAT (V-Twin) libraries stopped functioning properly under CodeWarrior Pro 7, which we need for porting to OS X. Apple refused to help, so we had to switch to a different full-text indexing engine.

BAL Nasty.

KON Hey--save that for the end.

BAL Sorry.

KON I looked at several indexers, and picked one called Lucene (http://jakarta.apache.org/lucene/). It's written in Java, but it's reasonably fast, it has a good license, and it's open source so we can't get screwed again.

BAL Sounds pretty good. How's it working for you?

(100) KON I got the indexing working, and I have the searching working. I'm trying to customize the pre-processing of the words. I'm rather new at Java, so I had a little confusion about how the pieces fit together, in what packages to put my classes, and stuff like that. But I finally put the Java side together. And it's crashing mighty weird.

BAL Time to run it through the debugger.

KON Not so fast. Remember that this is running within a Carbonized app in a JNI Invocation environment. I don't know how to make the Project Builder debugger attach.

BAL You're using Project Builder? Why?

KON I was trying to get started quickly, and I couldn't figure out how to make CodeWarrior compile a non-GUI class.

BAL You should have looked harder.

(95) KON Probably. But we're getting off the point. I'm crashing like this:

-----8<-----
java.lang.NullPointerException
	at org.apache.lucene.analysis.TokenFilter.close(Unknown Source)
	at org.apache.lucene.analysis.TokenFilter.close(Unknown Source)
	at org.apache.lucene.analysis.TokenFilter.close(Unknown Source)
	at org.apache.lucene.analysis.TokenFilter.close(Unknown Source)
	at org.apache.lucene.analysis.TokenFilter.close(Unknown Source)
	at org.apache.lucene.analysis.TokenFilter.close(Unknown Source)
	at org.apache.lucene.index.DocumentWriter.invertDocument(Unknown Source)
	at org.apache.lucene.index.DocumentWriter.addDocument(Unknown Source)
	at org.apache.lucene.index.IndexWriter.addDocument(Unknown Source)
	at com.baseview.iqueserver.luceneindexer.IndexWriterThread.addFileToIndex(IndexWriterThread.java:347)
	at com.baseview.iqueserver.luceneindexer.IndexWriterThread.run(IndexWriterThread.java:73)
-----8<-----

The addFileToIndex is the last function that's mine. Line 347 is

-----8<-----

IndexWriter writer = new IndexWriter(indexer.indexPath, indexer.analyzer, false);
...
writer.addDocument(document); // <- line 347

-----8<-----

BAL Sounds like a bad argument to me. You were saying that this started when you were tweaking the pre-processing of the tokens?

(90) KON That's what I thought too. But my token filters are structurally identical, both derived from the sample Lucene filters. And I know they work.

BAL Where do you build that analyzer thing?

(85) KON Here:

-----8<-----
public final TokenStream tokenStream(String fieldName, Reader reader)
{
	TokenStream result = new StandardTokenizer(reader);
	if (minWordLength > 0)
		result = new ShortWordFilter(result, minWordLength);
	result = new LowerCaseFilter(result);

	if (stopList != null)
		result = new StopFilter(result, stopList);
	if (substitutionList != null)
		result = new SubstitutionFilter(result, substitutionList);
	result = new StandardFilter(result);
	if (stem)
		result = new PorterStemFilter(result);

	return result;
}

-----8<-----

It's just chaining token filters. The StandardTokenizer gives us a token. Then we run it through the ShortWordFilter, then the LowerCaseFilter, then the StopFilter...

BAL OK, I get the point. And when you added your filters it crashed.

(80) KON Yep. ShortWordFilter and SubstitutionFilter are mine.

BAL Well, since we can't run it through the debugger, let's just comment lines out to see where the problem goes away.

(70) KON After three compiles and executions, we find that if we try to include a ShortWordFilter in the chain, it dies. Including the SubstitutionFilter doesn't affect it. But those two filters are identical. Minor differences in processing; nothing big.

BAL Time to get out the DebugStrs.

KON You mean System.err.println.

BAL Um, yeah. Put one at the beginning and the end of the tokenStream function. Does it get executed intact?

(60) KON Yes, both print. But remember, we're not dying here, just when we're trying to use the TokenStream object that we built here.

BAL Uh huh. Well, throw a DebugStr in the constructors of your two filters.

KON System.err.println.

BAL You know what I mean.

(50) KON Whatever. What we find is that tokenStream starts, the SubstitutionFilter's constructor is called, and then tokenStream ends. The ShortWordFilter's constructor isn't called at all.

BAL That's weird. And minWordLength is bigger than zero?

(40) KON In all of these cases.

BAL Why wouldn't the constructor be called? The analyzer and the ShortWordFilter are in different packages?

(30) KON Yes, they are. The analyzer is in com.baseview.iqueserver.luceneindexer, and the filter is in org.apache.lucene.analysis.

BAL Just a hunch, but comment out the "import org.apache.lucene.analysis.*" line at the top of the indexer file.

(20) KON Now it can't find a whole bunch of class definitions. Pretty much all the filters... wait... It's not giving me an error about ShortWordFilter not being defined.

BAL Thought so. Close your project, go trash your "build" folder, reopen your project, and build everything again.

(10) KON OK, I'm uncommenting the import statement, and... it works.

BAL I remembered that you had problems picking the package to put some of your classes in. Did you ever put the ShortWordFilter in a different package?

KON Yes. That was the first filter I wrote, and it was originally in the com.baseview.iqueserver.luceneindexer package. I moved it to the lucene.analysis package when it needed to play with the tokens more intricately.

BAL Exactly. And the SubstitutionFilter always lived in the lucene.analysis package?

KON Yes, that was the second filter I wrote. So for some strange reason, Project Builder cached the package that the ShortWordFilter was in, and didn't update it when I changed my code. And that caused all hell to subtlely break loose at runtime.

BAL Sounds about right to me. File a bug?

KON Will do. It's RadarWeb #2838261. Sounds like Project Builder's a little too smart for its own good.

BAL Nasty.

KON Yeah.


SCORING
90-100: Positively a Java psychic.
70-85: You'd do well as a Java guru.
40-60: A Java newbie, aren't you?
10-30: Just stick to brewing Java.

Post a comment