Sum: Statistical methods

Several weeks ago, I posted a request for some statistical methods to
analyze matrices derived from some copora of written English by the native
speakers and the Chinese EFL learners. So far, I have received only 2
responses that offered some help.

1. :
There is another improved and better program coming out: STATISTICA. It has
the function you specified: multiple regression, general non-linear analysis,
stepwise discriminant analysis, log-linear, ANOVA, and so on. If you can get
on World Wide Web, their address is: Or you can
give my your mailing address so that I can copy the brief intro to the
product, and you can have a look.
The company's name and phone/fax numbers:

2325 E. 13th St.
Tulsa, OK 74104
Phone: 918-583-4149
Fax: 918-583-4376

It has offices in London, Paris, ... and Taiwan. The phone number for the
Taiwan representatives:
Intelligent Integration Corp: 2-759-1791, fax: 2-759-1790

I think that you can contact the company directly. 

2. Prof. Gui Shichun from my own university (email: suggested that we use entropy-based
redundancy scores. The formula of entropy can be found in DATA, MODELS AND
STATISTICAL ANALYSIS by R.A.Cooper & A.J.Weekes, p.351, and the redundancy
score R is obtained by the formula


where H is the entropy, N is the number of cases.

We found some encouraging preliminary results: a)Both cluster analysis and
redundancy scores show that the use of certain tag-pairs distinguish the
Chinese EFL learners from the native English speakers. The frequency of
these tag-pairs often reflects the differences between Chinese and English.
b) There is strong relationship of linear regression between the non-zero
frequency scores and the sample sizes.
Xu Luomai
English Department
Guangdong University of Foreign Studies
Guangzhou 510420
P.R. China
Tel. (020)86656476
