Tuesday, February 10, 2009

I've got time on my hands and really don't want to discuss elections. There'll be more than enough of that tomorrow.

In the last few years, the explosion of text available via the Internet has generated tons of research that can be applied to Torah. So far, very little of it has been exploited. Here is a short list of applications that are worth pursuing:

1. Using the Responsa Project as a base, for any given halachic topic, automatically construct a structured table consisting of the main sources relevant to the topic. Simple searches (as now done in the Responsa Project and similar searchable corpora) don't separate the main sources from those that mention a keyword in passing. They also don't find texts that are about the topic but don't use the query term. They also don't structure the results according to the interactions between sources. All this is doable provided that cross-references can be identified and properly exploited.

2. Given any anonymous Hebrew-Aramaic passage, use stylistic analysis to either identify the author (whether or not the passage is found in whatever corpus you're using) or to profile the author: what period did he live in, what region, who were his teachers.

3. Given multiple manuscripts of the same text, reconstruct the original text. This involves determining dependencies among the manuscripts and also using clever methods to determine the reliability of each manuscript even without knowing any ground truth against which to compare it.

4. Find a precise formal definition of kal ve-chomer that satisfactorily explains when it applies and when it does not.

5. Explicate the rabbinic theory of causality and indirect action. It should be able to explain when to apply the principle of grama benezikin patur as opposed to da-in dina de-garmi. Determine if the same theory is applicable to the laws of Shabbos.

6. Consider the set of all passages (loosely defined) in the Torah and determine the optimal clustering of the passages according to stylistic criteria. Measure the quality of the derived clusters to determine if they are sufficiently robust to qualify as organic units. (For the record, I don't see any theological issue here.)

If anybody is interested in any of these research problems, give me a shout and I'll go into greater detail.

3 comments:

  1. Anonymous5:17 PM

    Am I right that #4 and #5 would require natural language processing, which has not been developed to the state of usefulness?

    ReplyDelete
  2. All but #4 and #5 require some sort of NLP. For those, existing tools are adequate. For #4 and #5, I don't see any use for NLP.

    ReplyDelete
  3. Anonymous5:20 PM

    On the contrary, I think 1,2,3,6 could be achieved to some extent with Bayesian methods. Beyond simply find the relevant sources without analyzing them - trivially done with current technology - I don't see what could be accomplshed re 4,5 without NLP.

    ReplyDelete