A technique, believed new, is described in Part II for assessing when such duplicated strings are due to chance arrangements of words during ordinary editing, or when due to a deliberate desire to make the fact of duplication be evident. In Part III the latter is found to be the case for the Matthew-Luke "Q" verses as well as for the verbal agreement of Matthew-Mark parallels. The technique is first explored against several actual test cases as well as against a synthetic test case.
A modified Augustinian hypothesis, which expands upon studies by Zahn and Vosté, can explain the observations, while being compatible with the external evidence and other internal evidence. It is summarized in Part IV and here, where it is tested against critics' arguments against Matthean priority. It requires that theological commitment be abandoned and the Gospel writers be treated as possessing human motivations over the question of whether Jews or gentiles were more worthy to become disciples.
I. A Summary of the Gospel Priority Problem as Influenced by 19th-Century Theological Commitment
II. Use of Frequency Distributions of Duplicate Strings of Words in Parallel Passages: Theory and Tests of Verbal Agreement
III. Results from Frequency Distributions of Duplicate Word Strings in the Gospel Parallels
IV. A Solution Embracing Realistic Editorial Behavior
The same can be said for evidence indicating that Luke came after Matthew and is dependent upon Matthew, not just upon Mark.4 Evidence for this dependence is bolstered by scholars favoring the Augustinian hypothesis (Matthew-Mark-Luke), who have set forth many plausible, non-refuted arguments.5 Traditional Augustinian scholars tended to utilize the now largely outdated assumption that the Gospels were written by the men whose names are attached to them, or that Peter was the source for Mark, or both. However, it must be emphasized that these do not necessarily negate other portions of their argumentation. These particular assumptions will not be utilized here, and the "writer of a Gospel" will be referred to in that manner rather than by the name of a Gospel.
One tool available to the text critic has been to compare the Greek text of certain parallel Gospel passages and note how long a string of successive identical words may occur. The longest extends to 33 words. Such verbal harmony has helped lead to the agreed upon belief that Matthew and Mark are not independent, and so also perhaps Matthew and Luke.6 However, this tool has not heretofore been quantified in terms of frequency distributions that can be compared against a statistically expected distribution for independently edited works. This will be done here, in Part II. The results from Part III will suggest that the two-document hypothesis, and any refutations applicable to a modernized Augustinian hypothesis, need reexamination from within a framework devoid of theological commitment. Hence, a brief review of the Gospel priority problem with this result in mind is presented first.
In 1783 Johann Jakob Griesbach challenged this tradition, proposing on the basis of a particularized textual criticism that the order had been Matthew-Luke-Mark, with each evangelist having made use of the previous one's works. By this assumption he was able to explain in somewhat acceptable terms the peculiarity that Luke follows Mark's order and content where Mark deviates from Matthew's order, and frequently deviates from both where Mark follows Matthew. He could explain that the writer of Mark "almost never diverged from Matthew in order and seldom in content unless he was following the order and content of Luke."9
The Tübingen school that grew in general support of this hypothesis helped spark both opposition and creativity from less radical scholars. In 1820 both the Augustinian and Griesbach hypotheses were challenged by Johann Gottfried Herder, who, as explained in a near contemporary report by H. U. Meijboom, believed that "among writings of such close affinity as the gospels, the briefest must be considered the earliest document."10 As noted by W. R. Farmer, this idea built upon the still earlier idea that the first gospel had been an oral gospel, which would be expected to give rise to a shorter written gospel rather than to a longer one.11
In 1838 Christian Hermann Weisse applied a more scholarly approach to the same basic idea. He reconstructed an Ur-gospel based upon passages found in all three synoptics, though this guaranteed that it would most closely conform in extent and content to the shortest of the three, namely Mark.12 However, many others have since pointed out the weakness of this argument, since Mark could as easily be an altered abbreviation of Matthew, as was considered to be the case by Augustine. Yet, Weisse took a large step in framing the modern two-document hypothesis by postulating that in addition to the priority of Mark, or of an Ur-Marcus, the Logia referred to by Papias was a second, important source of sayings. With the latter assumption, he was following F.E.D. Schleiermacher.13
In the 1840-1860 period Eduard Ruess's studies largely convinced fellow French scholars of the priority of Mark; he also believed that "when two gospels are mutually dependent the earliest date must be attributed to the shorter one." He seems to have been among the first to attach a theological motivation to this belief, in wondering aloud why the writer of Mark, if
he were not the first evangelist, would have omitted many of the discourses he found in Matthew.14 However, as noted by E. P. Sanders,
theological commitment ought not play any role in seeking historical truth:
"I have been engaged for some time in the effort to free history and exegesis
from the control of theology; that is, from being obligated to come to
certain conclusions which are predetermined by theological commitment."15 Such commitment on the part of a scholar prevents him from seriously considering more plausible alternative solutions, because his theology causes him to consider as implausible what would otherwise be considered plausible.
In the above instance, an alternative we shall come back to is that the writer of Mark was located in Rome.16 He was then writing his gospel for gentiles, and therefore omitted Judaisms he felt his future readers would not be interested in.17 Another
is that this writer did not share Matthew's emphasis upon humility and pacifism, and did not wish to include admonitions and concepts that he did not agree with or understand. Although these may seem like obvious alternatives to consider, theological commitment has dominated in deterring their open discussion, culminating with Canon Streeter's remark in 1924 that the writer of Mark would have had to be a "lunatic," if he had copied from Matthew, for leaving out the Sermon on the Mount and most of the parables.18 Such pejorative stemming from a most influential scholar no doubt caused still later scholars who may have wished to publish discussions of the alternatives to avoid doing so, lest they be branded lunatics.
It should be mentioned that the well known New Testament scholar David Friedrich Strauss, whose influential books came out in 1835-1840, is believed to be indirectly responsible for aiding in the promotion of the Marcan-priority hypothesis over the Griesbach hypothesis.19 He had favored the Griesbach hypothesis, which places Matthew first, but considered much within the gospels to be myths, in particular, the post-crucifixion appearances of the risen Jesus. The backlash that this induced from theologically committed scholars was very strong, thereby helping to promote the opposing hypothesis of Marcan priority.
Around 1850 the influential Heinrich Ewald similarly supported Mark as being primary, and furthermore contended that Luke and Matthew were written independently.20 This latter belief would seem to have arisen out of the theological commitment that the Gospel writers were "divine men," as described by Eusebius,21 and so surely should have had no need to copy from one another's work. Yet the assumption of independence between Luke and Matthew has persisted within the two-document hypothesis to this day. Ewald did not attempt to hide his theological commitment, and felt that Biblical criticism, though like a storm, was necessary in
clearing away the haze and darkness caused by the misinterpretations of other, perfidious and satanic, scholars.22 The latter he identified with the Tübingen school of biblical scholarship.
Weisse's form of the two-document hypothesis was endorsed by Heinrich Holtzmann, who rescued it from the likely defeat it would otherwise have suffered from the Tübingen school. Although Holtzmann assumed that an Ur-gospel underlay all four Gospels, he also had an Ur-Marcus in mind here, again in essence because Mark is the shortest of the Gospels.23
Later, Ur-Marcus came to be identified with Mark itself.
For the second document of the two-document hypothesis, Holtzmann drew upon the Logia of Papias, but assumed it contained material common to Luke and Matthew. This gave birth to the modern form of the two-document hypothesis. (Later, this common material was called "Q" and divorced from the testimony of Papias.) Advocates of the two-document hypothesis then
learned to compromise a little with theological commitment, to the extent that Matthew and Luke were allowed to be dependent upon Mark, after Ur-Marcus became identified with canonical Mark. This was less radical than the Tübingen school's departure from theological commitment in that the Griesbachians treated Mark as being dependent upon both Luke and Matthew, and Luke also
being dependent upon Matthew.
At this point another reason why the Augustinian school of thought was gradually abandoned needs to be mentioned. By the 19th century, if not before, it had been noticed that when parallel passages of Matthew and Mark dealing with the disciples are carefully compared, the Twelve consistently come out looking relatively dumb, fearful and disrespectful to their Lord in Mark as compared to Matthew. Even the Jewish people receive similar treatment. These are called "Mark's harder readings." This comparison is amply documented by Pierson Parker.24 If Matthew were considered primary to Mark, this would mean that the writer of Mark had intentionally made slight alterations in meaning to the Matthean text he copied/translated in order to cast the Jewish disciples and people in a bad light. Such a thought must have been intolerable to theologically committed scholars, and to others who wished to remain in good standing with their Christian colleagues and editors. It is still intolerable to some today, as evidenced by what one scholar said at a Society of Biblical Literature annual meeting, as reported by Daniel B. Wallace.24.1 This scholar confessed, "I cannot hold to Matthean priority because of Mark’s decidedly harder readings." The readings are "harder" only because it is hard for theologically committed scholars to believe that the writer of a gospel could have been pro-gentile and anti-Jewish.
The problem could most easily be removed by assuming Mark held priority over Matthew; then it could be argued that the writer of Matthew had improved upon Mark's rough language, and had added reverential touches. I believe the problem was of such grave theological concern that scholars rarely dared to mention it in writing; hence we cannot be certain just when this repressed consideration became paramount.25 Due to its
powerful emotional influence, however, it very likely contributed strongly to the growth in favor of the two-document hypothesis, and once that hypothesis dominated, the problem did not exist for its advocates. The most they then ever needed to say is: "Who can deny that Matthew enhances the disciples' and Jesus' images, both by upgrading their status and also by removing the unflattering warts which Mark for whatever reason retained?"26 Though the problem still existed for scholars who supported the Augustinian and Griesbach hypotheses, theological commitment or standards of "good taste" would prevent their opponents from mentioning it, and so no defense was needed.
If such commitment were set aside, however, students of the synoptic problem could then discuss reasons why the writer of Mark would have made such alterations to the Matthean text he incorporated into his gospel. Chief among these is the fact that Matthew denigrates gentiles in at least eight places, which vituperation the writer of Mark quite naturally either
refused to include in his gospel or greatly alleviated.27 The problem is enhanced by the real possibility that the key counteracting passages of Mt 12:17-22 and 28:18-20, the latter of which contains a Trinitarian-like formula, were later additions to Matthew.28
If theological commitment and political correctness are both set aside, one could also question why the writer of Matthew would have been anti-gentile in his outlook. With the Jewish background commonly believed of this writer, probably having once been a Pharisee or scribe and possibly even a rabbi, the answer is straight-forward. In being a strict follower of the Torah, he was to treat his own people with kindness (Lv 19:18), while treating gentiles as dire enemies (Dt 7:1-8; Ex 23:22-24) or as slaves (Lv 25:43-46). The God of Israel would be looking out for the welfare of his own people, who dwell in Zion, and not of outsiders (Is 10:24-27). Thus it is not surprising that the writer of Matthew held these anti-gentile beliefs. Jn 4:9 indicates that they persisted at least until the time of writing of that gospel. That this writer could seem anti-Jewish, too (e.g., Mt 27:25), could easily reflect his great disappointment that, nearly a century later, the Jewish people had not come to recognize Jesus as Messiah.
Thus the suspicion is strong, from a fresh Augustinian viewpoint, that the writer of Mark was avenging Matthew's denigration of gentiles by in turn denigrating the Jewish disciples.29 Needless to say, this possibility is studiously avoided in our own era, as it attaches an anti-Semitic attitude to the writer of Mark, and invites the same to any scholar who suggests it. At best, one can only say that
by this hypothesis the writer of Mark cast the Jewish disciples in a bad light so as to make gentiles look more capable by comparison and thus promote their discipleship, of which the writer of Matthew evidently did not approve. It would only have been human nature for the writer of Mark to have struck back at the writer of Matthew in some such manner.
Neither is the Griesbach school of thought free from its own theological commitment. It is because the editorial behavior of the writer of Luke seemed so inexcusable, if Luke came third and depended upon Matthew, that Griesbach postulated Mark to have come third; then a consequent editorial strategy underlying Mark could seemingly make sense and restore the editorial reputation of the writer of Luke. As later expressed by Streeter, the writer
of Luke would seem to have been a "crank" if he had taken Matthean material not in Mark out of order and placed it into inappropriate or out-of-context places within his own gospel. 30 This viewpoint again must stem from a theological conviction that a gospel writer would never
exhibit unsavory though realistic psychological behavior, and it tends to close off all debate of alternative explanations. Subsequent scholars did not wish to be labeled as "a person who supports a crank" by authorities within their own professional field.
The key alternative here is that the writer of Luke, like the writer of Mark, must have been appalled at Matthew's denigration of gentiles and statements to the effect that discipleship was reserved for the children of Israel. He would then have much preferred Mark over Matthew, though he apparently felt obliged to write a more universal gospel that required inclusion of much from Matthew that the writer of Mark had omitted (these
inclusions would later become known as Q). The writer of Luke could then subtly express his feelings against Matthew by following Mark's order and content where Mark deviates from Matthew's order, and inserting his own special material and Matthean inclusions elsewhere, the latter in improper order and context.31 As a result, there would be relatively few agreements in order between Matthew and Luke, as observed. Although this editorial behavior, for someone with an "axe to grind," may seem quite plausible in hindsight, theological commitment or concerns of "good taste" have until 1992 prevented its discussion in the available literature on New Testament studies.32
Thus the basis for the Griesbach school stems from an attempt to avoid a very serious breach of theological commitment, causing this school to accept what they must have considered to be less serious breaches of same in departing from the Augustinian tradition. Not surprisingly, the school of thought that suffers the least disruption of theological commitment is the one that came to dominate, namely that of the two-document (or two-source)
hypothesis. Its scholars could avoid the most serious breaches of faith already discussed by placing Mark first and maintaining that Luke and Matthew were written independently.33 This latter assumption is primarily what will be called into question by the remainder of this paper.
This brief review need not be amplified here or continued further towards the present time, as the key assumptions leading to the present two-document hypothesis were laid down over a century ago. Although modern scholars are less beholden to theological commitment than their counterparts back then, the danger that exists now is the strong desire for the present consensus to signify successful scholastic achievement and progress within the profession. This desire tends to preclude any serious reexamination of the subject, its tenuous assumptions and the alternatives, lest such be recognized as an admission that the consensus could well have been wrong in its essential
conclusions over the past century.
Suppose that at a certain point the two translators or editors of the same basic text, working independently, have by chance or necessity used the same three words in a row in expressing a thought within that particular section of the text. The odds that they would each choose the same fourth word is some fraction less than unity. Then the odds that their choice for the fifth word would also coincide is again expected to be further diminished by some fraction less than unity, as the passage is completed
and the next one within the parallel texts commences. Upon continuing this process through the entire text with all its parallel passages, one obviously finds that there is some average value for this fraction of diminishing odds, which we shall call f. The frequency distribution that expresses this reasoning is simply the geometric distribution, or the exponential distribution or curve, as will be explained.
Thus if I is the number of successive duplicated words in a string, and Y(I) is the number of strings of that particular length occurring in the one editor/translator's work, upon comparing it against parallel passages of the other's, we might postulate:
Y(I)= A*exp(-b*I)
(1) where the asterisk denotes multiplication, "exp" denotes e (=2.7183) raised in this instance to the -b*I power, b is the exponential decay coefficient and A is a coefficient of proportionality.
That is, for example, with
Y(4)/Y(3) = Y(5)/Y(4) = ... = Y(I+1)/Y(I) = f we have, upon using Eq. (1) and dividing Y(I+1)/Y(I),
f = exp[-b*(I+1)]/exp(-b*I) = exp(-b), which relates f to b and explains why an exponential distribution might be expected to prevail.
Even if a second editor/translator is not working independently of an earlier translation, an exponentially shaped distribution is still expected to result, for ordinary editing in any advanced language, though significant deviations may occur for I as small as 2 or possibly 3 when excessive numbers of particular short phrases may exist. However, Greek is an ideal
language upon which to utilize the method, as word order is there not particularly important and many different choices of verb tense, voice, mood, etc., as well as word order, lie at the disposal of the editor/translator. In the case of editing rather than translation, the particular value of b within the exponential formula will depend primarily upon the particular
style of the editor; if he feels the text is in poor shape and needs much editing, b will be relatively large. In the case of translation by two independent translators, b will be relatively large if the translators tend to use rather different vocabularies, or if one emphasizes meaning while the other emphasizes literalness. The feature to be emphasized
here, therefore, is how well the distribution fits an exponential-like curve, especially as I becomes increasingly large, rather than the precise value of b.
The coefficient A is very roughly proportional to the number of duplicated words, S, within the word strings, which is S = Summation[I*Y(I)]. It is of lesser importance, except that the larger S is the better, so as to minimize sampling error (scatter in the derived data). The ratio
S/N, where N is the total number of words in the text analyzed, will also be examined. However, N is dependent upon the manner in which the parallel passages to be analyzed are selected: whether simply by pericopes, or by excluding irrelevant verses within pericopes, or by excluding irrelevant halves of verses. Hence no particular importance will be placed upon A or S/N here.
Duplicate "strings" of only one word each, or sometimes two words each, are excluded from the present practice of the method because of their non-representativeness, and S will include the summation from I = 2 or 3 on up.
For normal editing/translating, the frequency distribution of strings of verbal agreement is expected to follow a curve of exponential type for all three of the following cases, which involve two texts, (a) and (b) that can be compared:
There is one vitally important exception, however. If this editor/translator purposely refrained in places from making editorial alterations over lengthy strings of text, purposely copying his source text word for word in a significant number of these places, the resulting frequency distribution would deviate
from the exponential, as this would produce an excessive number of duplicated longer strings of words above and beyond the exponential distribution exhibited by the shorter strings. This case of purposeful copying of longer strings without editing them will turn out to be of special interest here, in comparing
parallel Greek text of the synoptic gospels where copying of one sort or another is already suspected by most New Testament scholars to have taken place. However, the possibility of non-purposeful copying of sections of text through the editor/translator's inadvertent relaxation of a critical
attitude at times during the course of his editing of a lengthy text will also be examined.
II. Use of Frequency Distributions of Duplicate Strings of Words in Parallel Passages: Theory and Tests of Verbal Agreement
There are only a finite number of ways a particular thought or sentence is usually expressed, within a particular language. Thus if two different translators or editors are working independently from the same extensive text, either translating or editing it or both, there will be numerous occasions whereupon the same two or three or more successive words are by chance or of necessity utilized by both parties, within parallel portions of the two texts. Evidently, long strings of duplicated words will occur
much less frequently than short strings.
1) Original (a) ---> Edited version (b)
2) Original ---> 1st translator (a)
" ---> 2nd translator (b)
3) Same as 2) except that 2nd translator utilizes (a) as well as the original
This expectation follows from the fact that the editor/translator would be making changes of many different kinds at very many different places within his manuscript in his attempt to produce a more accurate version or translation, and/or to utilize better word choices and/or better grammar, and/or to produce a final work that is more understandable or one that better follows a certain philosophy or theology, or one that is written for a different intended audience. The number of reasons for editing or
achieving a differently worded translation is then sufficiently large that randomness can dominate the overall process. Since in case 3) the 2nd translator is also an editor, making some use of the 1st translator's work, we shall simply refer in all three cases to the later writer as editor/translator.
Testing the Procedure on Greek Text Translated from Hebrew
I am unaware that the method has been utilized previously, and so some tests are in order. To determine if the distribution actually lies close to the exponential for ordinarily edited/translated works, and to ascertain a rough value for b, the method is here first employed upon the Greek text of 2 Chr 35-36, all of Ezra and Neh 7:73-8:12, which overlap with the apocalyptic text of 1 Esdras (with the exception of 1 Esdr 2:16-5:7, which has no parallels.)34 These two texts are believed to relate back to different translations from Hebrew text, and whether or not the latter translator/editor borrowed words or phrases from the earlier one's Greek translation does not matter, if this was done in any ordinary manner.
Words are not considered duplicated here unless they are spelled exactly the same, and word strings common to parallel passages do not qualify unless the words are consecutive in both parallel strings and occur in exactly the same order. Punctuation is neglected (since it was not present in the original ancient texts), and also capitalization, if it was used only to signify the start of a sentence. 1 Esdr 5:29-34 and 8:13 were excluded from analysis, as they consist of long repetitive phrases like "the sons of X, the sons of Y...," all strung together, which have no counterpart within the Gospel parallels to be analyzed, and which might bias the result to some small degree. The results for this test case are shown in Table 1:
Table 1. Number of occurrences, Y, of duplicate strings of I words in the Septuagint's text of 2 Chr 35-36, Ezra and Neh 7:73-8:12, relative to the parallel verses in 1 Esdras.
|
I
|
Y
|
Exp. Curve
|
|
2
|
232
|
147.1
|
|
3
|
114
|
90.1
|
|
4
|
44
|
55.2
|
|
5
|
36
|
33.8
|
|
6
|
24
|
20.7
|
|
7
|
12
|
12.7
|
|
8
|
6
|
7.8
|
|
9
|
1
|
4.8
|
|
10
|
1
|
2.9
|
|
11
|
1
|
1.8
|
|
12
|
1
|
1.10
|
|
13
|
1
|
0.67
|
|
14
|
0
|
0.41
|
|
15
|
0
|
0.25
|
|
16
|
0
|
0.15
|
|
17
|
0
|
0.094
|
|
etc.
|
||
Y(I) = 392 exp(-0.49*I) (2)
which means that for each word added to a string's length, the probability of occurrence of that string is reduced by the factor f = 0.613, on the average. Thus, these two extensive sets of parallel passages appear to support the method, with b being in the vicinity of 0.5 (0.49 here). For I = 2 the exponential curve fitted allows for a significant excess of occurrences due to so many nouns, both proper and common, being preceded by the article. Hence the I = 2 datum is essentially ignored here.
The observed value for Y(3) lies somewhat above the exponential curve; this might be due to 3-word phrases such as the preceding 2-word examples followed by "kai," as well as 3-word prepositional phrases, tending to survive editorial rearrangements. On the other hand, this did not occur in two cases out of six to be examined, and thus may not be statistically significant. And since the data for small I are the more numerous and contain less sampling error, relative to the mean, than for large I, I have retained the I = 3 data in this study. Hence I have fit the exponential curve within the region I > 2 up to but not including values of I for which Y becomes as small as 2 or less. (It is an accident of sampling error here that five consecutive values of Y = 1 occurred in this latter region, without being interspersed with any zeroes or two's.)
The above exponential was derived as a least-squares linear fit to the data after transforming the Y values to their natural logarithms. Although the values in Table 1 derived from the "exp curve" are not accurate in any absolute sense to the number of digits indicated, the decimal digits are retained so as to avoid any confusion between the fitted-curve values and the Y integers of the raw data.
Since the two texts involved here no doubt were written well separated in time, we may consider the behavior of only the later translator/editor, as in case 3): Did he translate and edit in an ordinary manner that did not involve purposely copying lengthy strings of text, in case he had utilized the earlier translation in such a way while undertaking his own translation? The results indicate that his translation or editing was indeed ordinary in the sense that there are no lengthy strings of text for I >13, for which the expected number of strings rapidly diminishes to less than one, i.e., to 0.4 and less from the exponential curve. That is, if there were ten other nearly identical test cases available for analysis, an expectation from the exponential curve of 0.4 for a duplicate word length of 14 words means that in about 4 out of 10 of the cases one such string of words would occur, but none in the other 6 cases.
One value of this kind of analysis is that it indicates that one or two strings of verbal agreement as long as 12 or 13 words is not unexpected in this case, since the expected number of occurrences there has fallen only to 1.10 and 0.67, respectively. However, if two or three duplicate strings of 17 or more words had occurred, this would have been more indicative of copying along with purposeful refraining from editing, as the estimated frequency of occurrence in those instances is only 0.094 or less.
For reference purposes, the value of S here was found to be 1494, while that for N (from parallels within the 1 Esdras text) was approximately 6660, giving a ratio of S/N = 0.22.
The computer equivalent to this thought experiment was carried out, until the number of extracted black balls that occurred in strings of two or longer first exceeded S = 1495, approximately as in the 1 Esdras test case above. The results are given in Table 2:
Table 2. Number of occurrences, Y, of strings of length I of consecutively drawn black balls from a mixed bag of 577 black balls and 423 red ones, after 3113 drawings (after S=1497), from a typical realization. Also shown: the associated exponential curve, and estimated frequencies of occurrence of extrema.
|
I
|
Y
|
Exp. Curve
|
Top 1%
|
Top 5%
|
Bottom 5%
|
Bottom 1%
|
|
0
|
590
|
559
|
||||
|
1
|
311
|
322.5
|
||||
|
2
|
174
|
186.1
|
219
|
210
|
164
|
152
|
|
3
|
122
|
107.4
|
135
|
123
|
90
|
79
|
|
4
|
64
|
61.9
|
84
|
76
|
49
|
44
|
|
5
|
33
|
35.7
|
54
|
45
|
27
|
22
|
|
6
|
17
|
20.6
|
36
|
28
|
14
|
10
|
|
7
|
14
|
11.9
|
23
|
17
|
6.7
|
3.6
|
|
8
|
3
|
6.9
|
16
|
11
|
3.1
|
1.0
|
|
9
|
3
|
3.96
|
11
|
7.1
|
1.0
|
0.07
|
|
10
|
6
|
2.28
|
8.1
|
5.2
|
0.12
|
0
|
|
11
|
0
|
1.32
|
6.4
|
3.7
|
0
|
0
|
|
12
|
3
|
0.76
|
4.8
|
2.9
|
0
|
0
|
|
13
|
0
|
0.44
|
3.9
|
2.1
|
0
|
0
|
|
14
|
0
|
0.25
|
3.1
|
1.6
|
0
|
0
|
|
15
|
1
|
0.15
|
2.5
|
1.3
|
0
|
0
|
|
16
|
0
|
0.084
|
2.0
|
1.1
|
0
|
0
|
|
17
|
0
|
0.049
|
1.7
|
0.9
|
0
|
0
|
|
18
|
0
|
0.028
|
||||
|
19
|
0
|
0.016
|
||||
|
20
|
0
|
0.009
|
||||
|
21
|
etc.
|
0.005
|
Y(I) = 559 exp(-0.55*I). (3)
Table 2 shows the same general appearance or degree of scattering of values as Table 1, especially if both are presented in graphical form. In particular, it shows that a long string can easily occur (I = 15) through chance where the expected exponential frequency has dropped considerably below unity (to 0.15 in this case). The computer experiment was run another 99 times in order to obtain the firm estimate of the mean and estimates of the magnitude of the more extreme Y(I) values that occur less than or equal to 5% and 1% of the time, respectively. These are also presented in Table 2. Thus a string of 15 is expected somewhat more often than 5 times in 100; a string of 20 is expected to occur about one time out of 100, and cannot be totally ruled out. However, if such occurred in addition to two or three other similarly long strings within the same real data set, this would be cause for rejection of the assumption that the balls had been drawn randomly from the bag (or in the case of word strings of verbal agreement, a rejection of the assumption that ordinary editing of text had occurred without any deliberate copying).
Hence in these further computer runs the odds themselves of choosing each black ball from the bag were allowed to vary randomly on each draw from the bag, between values of 0.254 and 0.90, with a uniform distribution of odds in between. In the long run, then, the odds overall were the average of 0.9 and 0.254, which is the same as the previous value: 0.577.
The results were statistically the very same as before, being well fit by the same exponential distribution with the same values of A and b. (This program and its simpler predecessor were kindly programmed for me, using C++, by Frank Griswold of Corvallis, Oregon.) Thus as long as randomness is involved, the exponential or geometric distribution is to be expected from ordinary editorial alterations, barring word strings that are extremely short. It doesn't matter that the particular odds for each decision is steady or varies from word to word depending upon their context; a single value of f and thus for b that applies overall is obtained, when randomness dominates. Hence the exponential frequency distribution is a basic one against which to compare actual frequency distributions of word strings of verbal agreement in the case of ordinary editing and/or translation, when a sufficiently large data sample exists.
Table 3. Number of occurrences, Y, of duplicate strings containing I words each in two separate English translations of the Septuagint's text of 2 Chr 35-36, Ezra and Neh 7:73-8:12, relative to parallel verses in 1 Esdras.
|
I
|
Y
|
Exp. Curve
|
|
2
|
335
|
319.9
|
|
3
|
202
|
196.0
|
|
4
|
122
|
120.1
|
|
5
|
63
|
73.6
|
|
6
|
47
|
45.1
|
|
7
|
34
|
27.6
|
|
8
|
13
|
16.9
|
|
9
|
14
|
10.4
|
|
10
|
3
|
6.4 |