Computer-Assisted Anthropology News

Edited by James Dow

Vol. 2, No. 3, May 1987


STATISTICAL PROGRAMS FOR THE MACINTOSH

Timothy A. Kohler

Department of Anthropology Washington State University Pullman, WA 99164-4910

Introduction

Although more than 175 commercial statistical programs are available for microcomputers (Feldesman 1986), users of Apple's Macintosh need agonize over fewer than a dozen general statistical programs. In this article I compare nine such programs on several objective criteria and (more subjectively) on their usefulness in various settings. My purpose is to survey the Macintosh statistical landscape rather than to examine any program in depth. An effort was made to evaluate all the general-purpose statistical packages available as of October 1986, but narrowly focused statistical or graphing programs have not been included. Moreover, the manufacturer of one package that might have been reviewed (NWA StatPack) did not respond to preliminary inquiries. Despite the relatively small number of offerings, the available packages offer great diversity in their sophistication and power. As was not the case only one year ago, most Mac users now will be able to find a package that meets their needs from among those discussed below. The versions of the programs compared along with their hardware requirements and Fall 1986 prices are listed in Table 1.

These programs have little in common beyond their ability to produce basic descriptive statistics and their uniform lack of copy protection. One offering in particular (TrueSTAT) is so different that it is not included in most of the tables. Excluding TrueSTAT, the other packages lie along a continuum between WormStat, a small, inexpensive, easy-to-use bundle of descriptive and simple inferential statistical tools, and large, relatively expensive, and powerful packages such as STAT80 and SYSTAT that offer multivariate analyses and a range of options nearly equal to those found on mainframe statistical software. Along another dimension, some packages, such as StatView 512+, do not attempt to teach good statistical practice in their manuals, but present only the basic information needed to use the software. Others, especially Data Desk, STAT80, and the Lionheart packages, provide educational materials as part of their documentation. Some differences among these packages in size, sophistication, and degree of orientation towards instruction are illustrated in Table 2.

My evaluations were done on a 512K machine that has been upgraded to a Macintosh Plus in most respects; I still use the old numeric keypad and a 400K external drive. I noticed no compatibility problems for any of the programs with the new 128K ROM or Finder 5.1. Since the particular configuration I used, and other vagaries, affect the speed of the programs reported in Table 6, these timings should be considered only gross indications of the speed of the programs for one particular (but common) routine.

The remainder of this article consists of a series of tables comparing features across packages, and narrative discussions of each package arranged by ascending retail price. With a few exceptions that will be pointed out, you get what you pay for, although not everyone will need the most expensive programs, or want the additional complexity that goes with them. To keep the review as short as possible I have put much information in the tables. These tables are more comprehensive in their coverage of features in the less-sophisticated programs, and should not be interpreted as displaying all the features of any program.

WormStat

The "statistical package for the rest of us" was developed by George Wolford of Dartmouth College. It offers 16 statistical tools (Tables 3 - 6) in a format that looks like MacPaint. Although the package is marketed as "easy to use statistics for the laboratory, industry, and classroom" its pedagogical virtues are the most apparent. When the standard deviation is requested, for example, the formula as well as some of the intermediate calculations are shown. Calculating a t-test (with flexibility for application to matched pairs, and independent samples with equal or unequal variances) has two steps: in the first, the formula and result for the calculation of the t statistic is shown; in the second, the probability associated with that t value is calculated, and the area corresponding to the rejection region of the test statistic is shaded on the appropriate probability distribution. A graphic approach to scatterplots is also used effectively. The analyst can estimate a regression line for a series of displayed points by "drawing on the screen" (with a mouse), and then can generate the least-squares line for comparison on the same data. Since nearly everyone underestimates the influence of outliers in this procedure, this is a valuable tool for teaching about the vagaries of regression. This is the only package reviewed with this feature. Data can be entered from the keyboard or imported/exported via the clipboard. Imported data must appear in rows and columns; columns must be separated by tabs and each row must end with a carriage return. Several data transforms and generations are possible. The 128K version of the program is limited to 10 columns and 200 rows; the 512K version can handle a data matrix as large as 10 columns by 600 rows. (Six hundred normal deviates can be generated in 36 seconds.) Character variables are not allowed. The documentation is minimal but adequate; no on-screen help is available (or needed). Analysis is entirely menu-driven; there are no provisions for missing data.

At its educational price of $39.00 this program has no competition except for the Student Version of Data Desk, which is far more capable, but is available only to students registered in a course that has adopted the package. WormStat's simplicity and its effective use of graphics for a limited range of common statistical problems will make it most useful for the occasional analyst or the beginning student.

TrueSTAT

Authored by Thomas E. Kurtz, TrueSTAT is part of the extensive Kemeny/Kurtz Math Series available for IBM PC-compatible machines, the Amiga, and the Macintosh. It is intended to be an aid in teaching about the characteristics of various distributions, and emphasizes their simulation (or generation) and their graphic display. Users of BASIC will find it easy to run from commands or from the built-in programming language; others can use the extensive menu system to get the same results. Analysis of datasets is possible, but generally requires more footwork than in the other programs reviewed here. Although I have not included TrueSTAT in Tables 3 - 6, it can easily produce all the standard descriptive statistics; identify medians, hinges, inner and outer fences, and extreme values; compute t-tests and generate confidence intervals based on the t distribution; and fit a least-squares line. It offers none of the nonparametric inferential statistics available in WormStat, however, although some of these could, perhaps, be programmed.

There are also flexible routines for plotting and scaling scattergrams, histograms, boxplots, and lines depicting confidence intervals. In some cases these can be superimposed on each other, or stacked vertically, making it possible to look at several depictions of the same data, or to compare boxplots (for example) of several variables simultaneously. I found myself enjoying creating histograms of real data and superimposing appropriately-scaled theoretical distributions on them. (Simulation functions make it easy to generate normal, binomial, Poisson, and uniform distributions, as well, random y-values with specifiable standard deviation around a line with given slope and intercept.) There are no provisions for labeling these graphics, but they can be saved to a MacPaint-compatible file for beautification, or printed directly.

One feature of TrueSTAT that deserves mention is its self-contained programming language, in which BASIC-like programs can be created and modified with a built-in line editor (which does not conform to usual Macintosh conventions and seems awkward). The language offers FOR-NEXT, DO WHILE or UNTIL, and IF-THEN-ELSE (and ELSEIF) structures; however, logical expressions using AND, OR, or NOT are not allowed, and in general the language is much simpler than True BASIC. Statistical and simulation functions include those mentioned above, plus critical values for the t and the normal distribution. The usual arithmetic functions are also available. Unlike WormStat and most of the other programs, there is no access to the desk accessories or to the Apple File and Edit menus. Menu choices appear as oblong buttons at the bottom of the screen instead of the standard pull-down variety.

The manual is terse and business-like. In many instances more detail would have been desirable. What, for example, are the size limits for data sets? I was able to generate 600 normal deviates in only 10 seconds (versus 36 for WormStat) but the program failed to generate 10,000 numbers with no error message. I found that I could import and export data via ASCII files, but got no help here from the documentation. What assumptions are made about equality of variances in computing two-sample t-tests? How can a long listing to the printer be interrupted? And so forth. There is an on-line Help facility that is useful as a quick reference, and helps make up for the lack of an index in the manual.

In all, I found this to be an enjoyable program with a rather narrow niche. At first I thought that with its rich set of graphics and simulation functions it would be useful as a general-purpose simulation language. However, the small subset of BASIC available limits utility in that direction. The program is not, and does not claim to be, a workhorse for general-purpose statistical analysis. However, it would be useful for teaching some aspects of statistics, especially in conjunction with a course otherwise directed towards programming in BASIC.

StatWorks

This is the least expensive, reasonably competent, general purpose statistics program for the Mac. (A competitor for this title is the original smaller version of StatView, not reviewed here). Advantages of StatWorks over WormStat include the ability to handle character and missing data; more flexible data handling, including a recode facility and more data transformations; and more inferential statistics, including polynomial and multiple regression, and 2-way ANOVA. Also, as might be expected from the makers of Cricket Graph, the plotting routines are better and more varied than those in Wormstat, or, for that matter, those in many of the more expensive packages discussed below. Surprisingly, WormStat has more number generation routines; StatWorks offers only random-number generation.

All numbers are internally stored in an 80-bit binary format that retains about 19 digits precision. As benchmarked on the Longley data (Longley 1967; Siegel 1985:15-17; Table 5), a set of econometric data in which the independent variables are collinear enough to try the computational accuracy of multiple linear regression algorithms, the results appeared accurate to more than 6 decimal places. (Up to 10 digits right of the decimal may be displayed, but fewer than that may actually be legible, since printing and screen display do take into account the width of each column, thus allowing adjacent columns to overlap.) The font size in the input data window may be either 9 or 10 point, but all printing is in the 9-point font. Each active window is printed as a separate page even when "no page breaks" is selected on the Page Set-up menu, making this a real paper-waster.

Data entry is via keyboard into a spreadsheet-like window. ASCII files may also be cut or copied from or to the clipboard, and SYLK files can be read. Much attention has been paid to producing attractive output and incorporating it into other software: a "Grab Picture" option allows export of graphics to MacPaint or to a word-processor; color settings are available for the Imagewriter II; and the program is fully compatible with the LaserWriter. No guidelines are given for how large datasets or analyses may be, and in fact the manual states that arbitrary limits are avoided through dynamic memory management. There is no on-screen Help facility, no phone number for technical support, and no stated policy on upgrades. Analysis is completely menu-driven.

StatWorks is quick and accurate for the analyses it supports. The manual, though small, is attractive, well-written, and includes all formulae. The powerful graphics are an important aid to analysis. I consider this to be a good buy when more complex routines are not needed, especially for users of 128K machines, and for those who will appreciate its "Mac-like" intuitive qualities.

Business Statistics

This and the following are two of 14 packages available from Lionheart in PC/MS-DOS, Amiga, AtariST, and other formats, as well as for the Mac. Other titles include Experimental Statistics, Exploratory Data Analysis, Decision Analysis Techniques, and Linear and Non-Linear Programming. The Lionheart offerings differ from any of the other packages reviewed here in that the emphasis of the documentation is on the statistical procedures themselves, with mention of the programs accompanying the texts made only in passing. "How to" considerations for the programs on the disks accompanying the text are relegated to an addendum. Appendices in the manual for Business Statistics discuss statistical distributions, normalizing transformations, finding the sample size, and factorials as screening experiments. There is also an extensive glossary of terms and an index. The package is written in Microsoft BASIC runtime version 2.11 and it is relatively slow (Table 5). Analysis takes place in response to a series of prompts. Upgrades and replacements for damaged disks are available to registered users at $15/disk plus postage. There is a phone number (not toll-free) for technical support.

The 42 separate programs in Business Statistics perform a wide variety of functions. Most are of general interest, although some involve quality control, reliability, and acceptance sampling, may have few applications in anthropology, and are not discussed further here. Many programs can accept data either through keyboard input or from an ASCII file. Errors in either mode frequently catapult the user all the way back to the desktop.

In general the package is rich in capabilities having to do with calculating probabilities for a wide variety of distributions, performing analysis of variance for many different designs, and for time-series analysis and forecasting. It is the only program reviewed that calculates sample sizes for a variety of sampling situations, and point and interval estimates from stratified, cluster, and sequential samples. (Unfortunately, I couldn't get the cluster sample routine to run; responding to prompts exactly as told resulted in a fatal syntax error.) It is very poor in its graphics capabilities (Table 3) and design and in routines for nominal-level data (Table 4) and regression analysis (Table 6). It is unintegrated in the sense that the Mac's operating system provides the only shell common to all the programs.

My experience running the Longley benchmark on this package is worth reviewing. To gain experience with the program, I chose to enter the data from the keyboard. There are four different and semi-incompatible data entry routines. I used MVDATA, which I found to be the most primitive data entry facility in any of the packages reviewed. Data was prompted for and accepted one point at a time, and I was expected to keep track of the correct i and j indices; I could never see the whole matrix. After input, one can ask to see the data, but it is displayed one variable at a time, rather than spreadsheet-style as in most of the other packages. The data matrix may be printed, but instead of being formatted like a matrix, each element in sequence is printed down the far left-hand side of the page, leaving 90% of each page blank. If a new or revised data set is saved using the same name as an existing data set, no warning is given.

Once the data were collected I used MVREG to calculate the coefficients. This is one of two multiple linear regression routines available, and as warned in the documentation, it is not very accurate, with round-off errors appearing after 4 digits (not 4 decimal places). The other program, MREG, turned out to be faster but no more accurate. A printout of the correlation matrix can be requested in MVREG, but it comes out on the Imagewriter in an almost indecipherable format.

Other gripes: the manual extolls inspecting x-y plots to spot curvilinearity before performing linear regression, but a simple scatterplot is impossible to produce (plots for time-series data can be made). There is no way to read a data file and build a contingency table from the contents; the limited contingency-table analyses present require the data to be entered as observed counts per cell. Likewise for z- and t-tests: you enter the means, sample sizes, and variances. The right-most characters on screen instructions and prompts were frequently missing. In some cases if you answer a Y/N prompt with a lowercase y by mistake you get bounced back to the desktop. And so forth. Unless you need some of the special capabilities of this package for ANOVA I would not recommend it.

Multivariate Analysis

This is another package from Lionheart, and has a structure similar to Business Statistics. In fact, some of the same routines for data entry and multiple regression are present, but the thrust of the package is towards matrix manipulation, and additional data entry, editing, and algebraic routines for matrices are included. The manual is more closely tied to the software than in Business Statistics. Major chapters in the manual cover data entry and management, multilinear regression, multiple correlation, component analysis, factor analysis and rotation, and multivariate analysis of variance; appendices discuss probit analysis, simultaneous equations, cluster analysis, and matrix operations.

Because this is the least expensive program with many of these multivariate routines, its usefulness is more apparent than is the case for Business Statistics. It may be especially attractive for the 128K user who wishes to perform multivariate analyses, since the other packages with these capabilities require larger machines. However, my subjective comments on the awkwardness of Business Statistics, and on its failure to adopt Mac conventions, apply to this package as well.

Data Desk (Professional Version)

This program, by contrast, was built from the ground up for the Mac by Paul and Agelia Velleman of Cornell University as part of the Apple University Consortium effort. It features basic descriptive statistics and inferential statistics through multiple linear regression in a student-oriented, graphics-intensive format. Not reviewed here is the inexpensive subset that will run on a 128K machine with one or preferably two drives. This version is available only to students in a course which has adopted Data Desk as a text.

The documentation consists of a manual with chapters (in sequence) on basic concepts, entering and editing data, displaying data, simple summaries, transformations, manipulating variables, random numbers and simulation, simple inference, comparing two samples, comparing means of several groups, two-way ANOVA, contingency tables, simple regression, correlation, and multiple regression. I give this long list to show that it is well laid-out to serve as an introduction to statistics and statistical computing. It is also an attractive, clearly written, and authoritative publication.

With the manual is shipped a supplement that covers the new features in Version 1.0, a "Quickstart" tutorial, and a detailed discussion of importing and exporting data, including examples of how the procedure works with Excel, MacSpin, MacWrite, Microsoft Word, and SYSTAT. Flexible commands for copying and pasting variables to and from the clipboard may be bypassed for larger files using "save in text file" and "read text file" commands. Cases may also be merged into existing variables using the "paste cases" command. Data entry is flexible and convenient and makes good use of windows. The basic unit of analysis is the variable, but a variable may be associated with others in "bundles" or data sets, so that a row of data across variables can refer to the same case. This association can be preserved during sorts (or other manipulations) on any variable.

The Longley benchmark took about 10 seconds to compute and was completely accurate to the 7 digits displayed. One disadvantage of this program (in relation to StatWorks, for example) is that the number of decimal points to be displayed cannot be controlled, and scientific notation is not an option. However, Data Desk (as is also the case for StatWorks and StatView 512+) uses the Standard Apple Numeric Environment and performs calculations in 80-bit arithmetic, preserving some 19 decimal places of accuracy. Output is fully compatible with either the Imagewriter or LaserWriter, and the bold characters and special symbols that are used effectively on the screen to highlight output may be preserved on printouts by copying in picture form, or may alternatively be copied in text form for later manipulation by word-processing software. Graphics can be instantly rescaled and reshaped on screen; clusters of points on a scatterplot may be selected with a rectangle for more detailed plotting; and pseudo-three-dimensional projections of points can be created and rotated through any axis

.

It is much more fun to use Data Desk than it is to write about it. I rate it as the best package around for instruction, and in a close contest with the discounted StatView 512+ as the best general-purpose research statistics program under $200 for the Mac (unless you need StatView's factor analysis capabilities). Perhaps an "advanced professional" version filling present lacunae in nonparametric and multivariate statistics may someday be available without unduly sacrificing ease of use or greatly increasing the price?

StatView 512+

This package and the next two are full-featured statistical programs offering most of the facilities that, until recently, could be found only on large mainframe packages such as SPSS or SAS. This is the least expensive and the smallest of the three, and the only one written expressly for the Mac. As is the case with other such packages (WormStat, StatWorks and Data Desk) graphics are flexible and well-integrated into the analysis sequence. In StatView, the various graphic displays are not viewed as something inherently different from other kinds of output, but just as one of several available "views" for the results of analyses. As is also the case with the other packages originally written with the Mac in mind, numerous menus and dialogue boxes make it clear how to perform various analyses. If one must refer to the manual for a fine point, it is found to be clear and well-written, with an appendix for formulae and statistical details.

Data entry from the keyboard is made into a spreadsheet-like screen; it is fast and has several convenient features. For example, the user may define formats for categorical variables with up to 64 possible textual values. These formats, called "category sets," may be associated with one or more variables, so long as these variables are typed as "categorical." In data entry, keying either the ordinal number for a value, or enough alphanumeric characters so that the value label is unambiguously identified, causes the text value to appear instantly in the spreadsheet cell.

Quite large data sets can be manipulated. According to guidelines set forth in an appendix, a Mac Plus should be able to handle data sets with as many as 1,300 columns (variables) each with 50 cases (rows) of real data. A Mac Plus using no RAM cache, desk accessories, or Switcher can squeeze in 49 real variables and 1 integer variable each with 1,550 cases. Active data are held in RAM.

Data can be cut or copied from one StatView dataset to another, or from a Mac spreadsheet to StatView, using the clipboard. Or, text files can be brought in in a variety of formats using the "import" function. An example text import file is included on one of the disks. Both the Imagewriter and LaserWriter are supported, although an annoying and unnecessary page eject precedes all output on the Imagewriter.

The regression/correlation routines harbor both nice features and disappointments. Unlike any other programs reviewed, scatterplots can display confidence limits for the slope of the regression line or confidence bands for the true mean of Y, at either the 90% or 95% level. The Longley benchmark ran relatively quickly and the results were accurate to the 9 decimal places I could display. Problems created by overlap of data points in scatterplots of large data sets can be handled in several pleasing ways. However, partial correlation coefficients cannot be calculated directly, and various valuable regression diagnostics such as DFFITS and DFBETAS that one might hope for in a large package are missing. (Only SYSTAT offers DFFITS, and none of the programs reviewed offers DFBETAS.)

Likewise, the factor analyses available include principal components, Kaiser image, Harris image, and iterated principal axis methods with equimax, varimax, or quartimax rotation and considerable control over the number of factors to extract, calculated from either raw data or from a correlation matrix. Despite this sophistication, there are no accompanying programs for other frequently needed multivariate routines such as discriminant or cluster analysis. However, several "accessory" programs are expected from Abacus Concepts, Inc., in Spring 1987 with such advanced features.

StatView 512+ has recently been given a glowing review by Ward (1986). Nevertheless, some researchers may want to sacrifice its ease of use and graphics for the additional statistical routines available in the last two packages.

STAT80 (Professional Version)

This package was originally written for and still runs in various mainframe environments, but has recently been ported to the Macintosh where it runs in Microsoft FORTRAN. An IBM AT version will be released soon. Except for a 44-page introduction and a chapter on ANOVA that were written for the Mac version, the very large manual that comes with the 5 disks is general to all versions. A quarterly newsletter provides information about new releases, programming tips, and schedules for classes in the use of STAT80 that are offered in either Salt Lake City or on-site. There is an on-line help facility, and this is the only package to provide a toll-free technical support number for registered users.

The mainframe ancestry of this package quickly reveals itself in several ways. Output from procedures scrolls down the screen rapidly, and must be temporarily halted and restarted with control-s and control-q keystrokes. Nor can one scroll backwards through output, except by accessing an external LOG file. Files must first be opened and then read (two separate operations). Output from statistical procedures does not go directly to the printer, but can be selectively routed to either or both the screen and a text (LOG) file on disk; the text file is later submitted for printing if desired. In this procedure the Imagewriter is supported, but output destined for the LaserWriter needs additional editing. There is an internal data entry facility, but it appears to be less flexible than Apple's own EDIT program (from the Macintosh 68000 Development System) which is distributed with STAT80. The only catch is that the user must quit STAT80 and go back to the desktop to get into EDIT. I had no difficulty importing or exporting text (ASCII) files.

Some aspects of the Mac user interface have been added on top of the existing facilities in STAT80. Most activities, therefore, can be initiated either from a pull-down menu system or directly from a command line. There are also short-cut key strokes to get into the menu chain. A nice feature of the Mac version is a window (which by default is small and located in the upper right-hand corner of the screen, but which can be re-sized, relocated, or closed) displaying the variables in the workspace. Missing in the Mac version, but present in the mainframe version of this package is the "proc" language in which arbitrarily long sequences of activities can be coded in a BASIC-like language.

The Longley benchmark ran very fast. The first time I ran it, two variables were not included in the solution because their inclusion would have lowered the "tolerance" level below the default threshold. (Tolerance is defined as 1 - the multiple R2 among the independent variables already in the model.) This is the only package that gave such a dramatic warning of the high colinearity among the 6 independent variables in the model, and only SYSTAT also displays the tolerance as well as (optionally) other colinearity diagnostics. I ran the Longley model again, specifying a very low tolerance level. The results were achieved just as rapidly, but the coefficients were accurate to only about 4 digits, no better than the performance of the Lionheart packages.

Sophisticated users familiar with general linear models and matrix algebra will be able to profit from a program entitled GLMODEL that allows complete specification of contrast matrices and makes it possible to perform tests not foreseen in the canned routines for ANOVA and regression. A separate facility for matrix manipulation is also provided for even lower-level control. These features make it possible for the knowledgable user to perform tasks, such as log-linear analysis, that are not "x"ed in the tables.

Since they are similar in list price (although site licensing and discount deals make it hard to predict what your price might be) it is useful to compare StatView 512+ and STAT80. With its facilities for cluster analysis, canonical correlation analysis, and GLM, STAT80 has the clear edge in multivariate capabilities. On the other hand, STAT80's graphics capabilities are quite poor; its accuracy on the Longley benchmark was disappointing; and it is harder for the novice user to negotiate for quick results. Nevertheless, of the packages reviewed to here, STAT80 comes the closest to making it possible to use the Mac in place of a mainframe.

SYSTAT

This last package is in a class by itself, both in terms of power (see Tables 3 - 6) and in terms of price (Table 1). In its very similar IBM manifestation the current version, 3.0, has received high marks in recent reviews (e.g., Infoworld September 1, 1986: 32 - 34) as an "analytical powerhouse" and as the top contender, along with SPSS/PC+ (not available for the Mac) as the "best [microcomputer] statistics package." Unlike SPSS and STAT80, which made the move from mainframes to micros, SYSTAT, first developed for microcomputers, has been moving in the opposite direction.

SYSTAT offers several advantages over STAT80, including an internal full-screen data editor (painfully slow for scrolling through large data sets); much greater accuracy (to the 9 decimal places I could display) for the Longley benchmark and presumably elsewhere (although the package is written in FORTRAN, and therefore cannot access the Standard Apple Numeric Environment, all arithmetic is double-precision); and a number of additional procedures, including multidimensional scaling, log-linear analysis, time-series analysis, and non-linear estimation. And if that's not enough, there is also an internal interpreted programming language, SYSTAT BASIC. As with STAT80 (although STAT80 achieves the same end differently) there is a way to move from module to module without exiting to the desktop.

The current version of the documentation, dated 1985, is much better than previous versions, which were frequently reviewed as too terse, even enigmatic. Leland Wilkinson, author of the documentation and much of the code, is a professional statistician who thoroughly engages his readers with cautions about analytical routines such as stepwise regression, which gets a deserved drubbing. However, there is no mention whatsoever in this manual of SYSTAT's implementation on the Macintosh, since the Mac version was first released in Spring 1986. It may be unreasonable to expect a completely rewritten manual for the Mac, but a preface, as offered by STAT80 to its Mac users, seems highly desirable. Thankfully, given the lack of Mac-specific documentation, there is a phone number for technical support. This is not an 800 number, but calls are returned.

In general I had little difficulty figuring out what to do with SYSTAT on the Mac. SAS users will note some familiar features, such as pointer directions on formatted input statements. On-line help can be referenced either through pull-down menus or through keyboard-issued commands. There are many examples throughout the manual and several example sessions in an appendix. The commands and the output produced by them can be re-examined by scrolling backwards through a session. Commands can be saved by themselves to an external file and resubmitted batch-style if desired. (However, activities carried out through the menu system, such as accessing a file on the external disk rather than on the internal disk, are not tracked successfully by the command log, and files saved through the menu do not have the same suffixes automatically applied to them as do files saved from the keyboard.) Printing is supported to the Imagewriter but not to the LaserWriter in this release; the present release accesses only 512K while the next release will access the full RAM available in a Mac Plus. As of December 1986 the upgrade policy to the next release for owners of the present version had not been decided.

Both the Macintosh and the MS-DOS versions of SYSTAT are limited to 200 variables having as many cases as can be handled by the amount of disk space available. Additional SYSTAT products available for MS-DOS machines (Report Writer, STAT/TRANSFER, PROBIT, LOGIT, and others) are not available for the Mac version, at least at this time, but anyone can buy the handsome white sweatshirt with green SYSTAT logo for only $12.00.

SYSTAT contains 12 modules, one of which, DATA, is used for keyboard data entry and reading ASCII files (once files are translated into SYSTAT's internal format, they can be read directly by any of the other 11 modules). I had some problems reading SYSTAT-written ASCII files having more than 6 variables with other packages. SYSTAT writes these files as 12-column fields separated by commas, but records longer than 80 columns are broken by carriage returns and line feeds. Thus, from the point of view of other packages, handling of divisions between variables is inconsistent. Perhaps an LRECL specification for output records similar to that available for input ASCII records longer than 80 columns could be implemented to overcome this problem.

Conclusions

Statistical packages for microcomputers have come a long way in the last year. Software for the Macintosh has participated and in some cases led in these advances. Nevertheless, and despite some claims to the contrary, none of these packages can fully replace a large system such as SAS running on a mainframe for range of features and ability to handle large data sets; nor can we legitimately expect such performance.

There is no single package here that is best for all uses. In the mid-to-upper range of capabilities, Data Desk, STAT80, and StatView 512+ (at its commercially discounted price) are good values for the features they offer in relation to their price. SYSTAT is presently alone in its combination of power, speed, and accuracy, though Mac Plus users considering this package might wait for the next release (probably Spring 1987) and some Mac-specific documentation. For certain specialized uses on 128K machines one of the Lionheart offerings might be the package of choice. Finally, for infrequent users with no need for complex procedures, StatWorks or even WormStat are plausible choices.

Acknowledgments

I thank the manufacturers and distributors whose products are reviewed here for supplying software and documentation. All of them had an opportunity to review a draft of this article to help spot errors of fact, and special thanks are due toTrue BASIC, BrainPower, StatWare, and SYSTAT for doing so. I am also grateful to the School of American Research and its Weatherhead Scholar program for patience during this brief excursion from southwestern archaeology.

Notes

Current address: School of American research, P.O. Box 2188, Santa Fe, NM 87504-2188

Commonly quoted prices as of December 1986 through discount houses.

When 800K is specified, it may be in either one or two drives.

A Student Version, lacking importing/expofting of data, 3-D plot rotation, and restricted to about 15 variables and 1,000 total values, can be purchased by students for $35.00

Newest release, not used in this review, is 1.1.

StatView, a version not requiring the 512K Mac, is available for $99.95.

For prof. version on 5 disks, including multivariate statistics; $249.00 for standard version, on 4 disks, w/o multivariate statistics. Site licenses, up to 100 copies, $2,500; up to 1,000 copies, $5,000.

Quantity discounts available; site license for university, $3000.

Educational price, $39.00; site license, $1500.00.

A 19-page "Program Notes" section at the end of the manual serves much the same function however.

Program segments can be run from two 400K disks, or can be linked with an included program called Joiner and run from a single 800K disk.

The most commonly used segments can be combined on two 800K disks.

A quick reference card is also included, a feature unique to SYSTAT.

These are example programs rather than just datasets.

For the 512K version; 71K for the 128K version; both versions shipped on same disk.

Stapled; no binding.

Up to 3 variables may be plotted to the same scale in a single window.

Similar to scatterplot, except that a line connects the data points from left to right across the x-axis.

Square only

This and some of the other capabilities below are available not as "canned statistics" but through programming in SYSTAT BASIC. I have mentioned such capabilities only where examples are given in the manual.

95% shaded confidence intervals for medians may overlay boxes.

With rotation and many options.

A pseudo-3-D plot is available where the z-dimension is expressed as different letters for symbols plotted on the x-y plane. There is no visual sense of depth as in Data Desk and StatWorks.

For these and the other nonparametric statistics below, only the critical values for these tests are given; actual calculation of the test statistic is by the user.

A statistic similar to the Wilcoxon signed-rank test.

Ascending only; numeric only.

Where separate, explanatory labels can be linked to real or integer values.

Where separate, explanatory labels can be assigned to variables, whether or not the variable names are themselves cryptic (VAR01) or self-explanatory.

Not available except through programming in GLMODEL, but limited ability to test association in n-way tables is provided by a Mantel-Haenszel Chi-square test.

Performed as a Pearson's product-moment correlation on ranks.

by-group-like processing is possible through the "split columns" facility.

Lead variables facility also available

Using MREG; 17 using MVREG. Neither was accurate beyond the 4th digit.

Not in the standard regression program, but programmable through GLMODEL, a general linear modeling program.

Includes a 3-second disk access. Once the appropriate program segment is in RAM (e.g., during a second use of the regression segment) the Longley timing drops to 6 seconds

References

Feldesman, Marc R. 1986 A review of selected microcomputer statistical software packages: A cautionary tale. Computer-Assisted Anthropology News 2(1):3 - 26.

Longley, James W. 1967 An appraisal of least squares programs for the electronic computer from the point of view of the user. Journal of the American Statistical Association 62:819 - 841.

Siegel, Jay B. 1985 Statistical software for microcomputers. North-Holland, New York.

Ward, Terry A. 1986 Stat's the answer. MacUser 1(15): 91 - 94.

Table 1: Price, Availability, and Hardware Requirements


                                                Price           Requires:
        Name    Version Supplier                List/Disc.      (RAM/Disk)

Business        4.0     Lionheart Press, Inc.   $145            128K/400K
  Statistics            PO Box 379, Alburg, VT
                        05440; (514) 933-4918

Data Desk       1.02    Data Description, Inc   $175/na         512K/800K
                        PO Box 4555, Ithaca, NY
                        14852; (607) 257-1000

Multivariate    4.0     Lionheart Press, Inc.   150 / na        128K/400K
  Analysis              PO Box 379, Alburg, VT
                        05440; (514) 933-4918

StatView  512+  1.0     BrainPower, Inc., 24009 350 / 179       512K/800K
                        Ventura Blvd., Calabasas, CA
                        91302; (818) 884-6911

StatWorks       1.2     Cricket Software        125 / 79        128K/400K
                        3508 Market St., Suite 206
                        Phildelphia, PA  19104

STAT80       Ver. 1  StatWare, PO Box 510881 399 / na        512K/800K
                Rel.    Salt Lake City, UT  84151
                2.10    (801) 521-9309

SYSTAT          3.0     SYSTAT, Inc.,           595 / 459       512K/800K
                        2902 Central St.
                        Evanston, IL  60201
                        (312) 864-5670

TrueSTAT        1.0     True BASIC, Inc.        80 / 36         512K/400K
                        39 S. Main
                        Hanover, NH  03755
                        (603) 643-3882

WormStat        1.01    Small Business Computers 79 / na        128K/400K
                        of New England, Inc.
                        PO Box 397, 4 Limbo Lane
                        Amherst, NH  03031

View Tables 2 to 6 in PDF format


HUNTING AND GATHERING TALES

ROUNDTABLE SYMPOSIUM ON MICROCOMPUTERS IN ANTHROPOLOGY

A roundtable on microcomputers in anthropology was held in March, 1987, at the Southern Anthropological Society conference in Atlanta, Georgia. The aim was to focus on informal presentations, the sharing of ideas, and applications in teaching, research, and administration. The organizers were David "Megabyte" Johnson, Dept. Sociology and Social Work, NC A & T State University, Greensboro, NC, 919-334-7894 and Kim McBride, The Museum, Michigan State University, East Lansing, MI 48824, 517-332-4830.

COMPUTERS IN SOUTHEAST ASIA

The Joint Committee on Southeast Asia of the Social Science Research Council and the American Council of Learned Societies is interested in fostering discussion on the rapidly increasing use of computers in Southeast Asia. Particularly since the advent of microcomputers, Southeast Asian businesses, government offices, research institutes, and schools have undergone significant changes. What are the effects of the new technology (and differential access to it) on changing patterns of work, literacy, and education; on the acquisition, processing, and communication of information; on power relationships? How are the vast majority of people still without access to computers influenced by "computer culture" and the "information revolution"? As an initial step toward developing a project on the social, cultural, economic, and political impact of computers in the region, the Committee would like to establish contact with social scientists and humanists who are contemplating or who have already conducted research on some aspect of these issues. At a minimum, the Committee hopes to facilitate the formation of a network of such researchers. If warranted, it may also attempt to sponsor meetings on the subject, and to fund empirical research as well. Interested persons should write to Dr. Toby Alice Volkman, Social Science Research Council, 605 Third Avenue, New York, New York 10158.

1987 AAA SESSION ON COMPUTERS

Jeffrey Ward (Sybase) and Gurcharan Khanna (UCB) are organizing a session on computer applications in anthropology for the coming AAA meetings. It will focus unique and novel applications that are at the forefront of the computer methodologies. Some of the topics they hope to cover are the management of unstructured databases, image processing in archaeology, expert AI systems to model cultural knowledge, the simulation and analysis of networks, and the mingling of text and graphics in databases. Ward's address is Sybase, 2910 Seventh St., Suite 110, Berkeley, CA 94710.

ABBS CHANGES PHONE NUMBERS

ABBS, The Anthropologists Bulletin Board (CAAN 2(2):25-26), acquired a new phone number in January, 503 229-3912 Ma Bell will change the exchange during July, 1987. After that the phone number will be: 503 279-3912

Marc Feldesman, the SYSOP of ABBS and Chairman of the Anthropology Department at Portland State says wait until the first number gives you a sweet recorded voice and then use the second number from then on. ABBS is open to anyone belonging to any professional anthropological organization. It can be used for posting notices or for discussing of anthropological topics. It is jointly sponsored by the Society for Applied Anthropology and Portland State University. Job notices coming to the attention to the SfAA appear there.

COMPUTER ACCESS FOR LONDON'S COMMUNITIES

Stella Mascarenhas-Keyes has been appointed research coordinator of a large project called COMPUTER ACCESS FOR LONDON'S COMMUNITIES. The two year project will investigate whether disadvantaged groups such as women, black and ethnic minorities, and the disabled in London are getting equal access to new technology training and jobs. If anyone is interested in being considered for involvement in the project write to Stella at London Voluntary Service Council, 68 Charlton Street, London NW1 1JR, ENGLAND.


PRODUCT ANNOUNCEMENTS

COHORTS

Daniel Gross

I have written a BASIC program which may be useful to anthropologists interested in comparing the age/sex pyramids of different populations. It converts census data on the number of people in each age/sex cohort into a convenient table and a pyramid graphic on the screen of an MS-DOS, graphics-equipped microcomputer. The graph can be printed using a dot-matrix printer. COHORTS will accept input from a disk file or from the keyboard, using prompts. The user can specify the cohort interval (in whole years) and the input prompts are adjusted accordingly. For a copy, send a blank 5-1/4 inch double-sided, formatted diskette with a self-addressed, stamped envelope, or $5.00 US to Daniel R., Gross, 5313 Briley Place, Bethesda, Maryland 20816.

WORDTREE

Henry Burger

Univ. of Missouri - Kansas City

A basic problem in computerizing language has been the relation between the several million words in the vocabulary of any language. The alphabet, which codifies virtually all word books including the dictionary, is of course a historical accident. Roget's Thesaurus grouped all words into 1,000 categories. But Roget published seven years before Darwin, hence organized concepts pre-evolutionarily. Thus, the rather primitive characteristic of Excitability received the high (complex) number 825--but the advanced human capacity of prediction got the lower (simpler) number of 511.

A possible breakthrough has come with Henry G. Burger's following the insight of Niels Bohr in physics, that things may be perceived in either waves or particles, but not both simultaneously. Burger, anthropology professor at University of Missouri-Kansas City, decided to segregate the vocabulary into either structural words (cf. nouns) or processual words (cf. verbs). He then gradated those verbs which impact the environment--which connect cause and effect; these are English's transitive verbs. Over 27 years, his project found that each, except the most primitive like INCREASE/DECREASE, can be defined as the addition of the next simpler process plus the addendum. Thus, to RIDE and to MANEUVER = to JOCKEY. Again, to HOLD and to STAY = to FASTEN. To GROUND and to FASTEN = to STAKE. To STRETCH and to FASTEN = to RACK, etc. The entirety is processed by a 5,000-line program. Among other actions, it cross-references each of the three parts to the other two. On the "F" pages at FASTEN, for example, are distinguished some four dozen methods of fastening.

The laser printout, plus instruction, a source for each verb, and so on, have now been published as "The Wordtree" (R). It is a neat, 380-page, hardcover book, cross-referenced both alphabetically and hierarchically. Included are a quarter- million word listings, with 30% more transitives than in the total Oxford English Dictionary. Cost is $149.00.

Dr. Burger explains that this hard-copy version must disseminate before he can consider offering it online. He can be contacted on Bitnet at: BURGER@UMKCVAX1. "The Wordtree"(R) is published by a company of the same name at 10876 Bradshaw, Overland Park, KS 66210-1148, U.S.A. The system is explained in a free, four-page brochur

NEEDLE IN THE HAYSTACK

Needle in a Haystack speeds up the search for useful data in the computerized reams of verbiage faced by researchers today: the lawyer's legal briefs, the social scientist's cache of notes, the computer buff's downloaded text files, even the anthropologist's fieldnotes. Needle automates the production of a two-level index to text, obviating the need for manual searches for information or the necessity of manually inserting special indexing markers in the file. Unlike the single level index produced by other programs which requires a user to examine every occurrence of an indexed word in the text itself to determine whether it is relevant, Needle's two level index allows a user to go directly to those occurrences most likely to be of use.

Text can derive from any source which produces an ASCII file, including most word processing programs, optical character readers, or modems. It can reside in one or more files on as many disks as necessary. Running the three associated programs comprising Needle in a Haystack is simple. Prepare a master file containing the names of all the text files to be indexed as a unit, run Number to number the paragraphs and sentences, run BuildDBA to do the actual indexing, then Printout to output the index to screen, file, or printer. Users have the option of indexing by paragraph or sentence number, and of producing a long comprehensive index or a shorter compact index including only entries which occur two or more times in the text. The output index is formatted into two columns, each consisting of primary words with secondary words and location numbers indented beneath them. A file of paragraph first lines with their associated numbers and sentence numbers is also produced, facilitating reference to the main text.

Needle in a Haystack has several features not apparent at first sight which contribute to its utility: it indexes by logical units, sentences or paragraphs, not by "lines"; the two levels of index are generated by using adjacent words only within sentences, maximizing the likelihood of a meaningful relationship between the pair; insignificant words are automatically skipped; unformatted text may be used because page numbers are not required.

Needle is not designed to search for specific words input by the user one at a time. Because the index becomes outdated after changes to the text file, neither is it designed for use on a text undergoing continuous modification although new text appended to the data file could, of course, be indexed as a separate unit. Needle is designed for the researcher with an unwieldy body of text, and in that environment it is a valuable tool.

SORT BLOCKS BY FIELDS

James Dow

Put some new clothes on your computer for fashionable field work this summer. Announcing SBF to help you look good in the field. SBF is a general purpose sorting program written for MS-DOS computers, including the PC family. It is licensed for free use by all anthropologists and students of anthropology.

Advantages over ordinary sort programs: (1) SBF sorts blocks of arbitrary size ending with a string of characters defined by the user. (2) The blocks are compared by looking at the first string in the block that begins with a sub-string also defined by the user. (3) There are no limits on the size of the file sorted.(4) A quicksort algorithm is used to speed operation. (5) SBF runs in multitasking operating systems. It uses only 64K of RAM while running. (6) Parameters may be given on the command line or after a menu. (7) SBF will sort bibliographies written in a variety of styles including that of the American Anthropologist. (8) Duplicate blocks can be deleted.

Current limitations: (1) To maintain memory integrity in multitasking systems, blocks of over 3K may sometimes be truncated. A warning tail is written on the block if this happens. (2) Null characters, ascii zeros, are removed from the input file. (3) The standard ascii sorting sequence is used. (4) Order is not maintained in sequential sorts of the same file. (5) SBF uses disk space to write temporary files. Free space up to twice the size of the input file may be needed to write temporary and output files. SBF works best with hard disks.

SBF stands for Sort Blocks by Fields. The user defines a block mark that separates the blocks to be sorted. Other field marks indicate where the comparison field is to be found in each block. One can create a text database by selecting a block mark to act as a record separator and field marks that designate each field in a record. It is a good idea to include a non-alphabetic character in the marks so that common text is not recognized as a mark.

A do-it-yourself database like this is not a substitute for a sophisticated database program. However, it does have some advantages. All the data is in an ordinary text file that can be edited and changed by any editing program. Pieces of the database can be split off to be worked on by text editors and then reattached to the original data base by simply appending the text. New databases can be generated by splitting and editing the original. If duplicate records are created, they can be eliminated with the /e argument. The database might be formatted for a print formatter like ROFF4, NROFF, TEX, PRT, etc. so that it can be printed out neatly at any time. The big advantage of a do- it-yourself database like this is that it is very flexible and creative. One is not tied down by the rigidities and complexities of structured databases.

One disadvantage of a do-it-yourself SBF database is the time that it might take to sort it if it gets very large. Therefore, SBF is most useful when one can work on sections of the database most of the time, and do a big sort only once in awhile to get things in order. SBF has been designed for multitasking systems, so for a massive historical research project, for example, one might push the sort operation to the background while using the computer for something else.

SBF was written to aid anthropological research and may be used by, and copied for use by, anthropologists and student anthropologists who are members of professional anthropological organizations. They may obtain a copy of the program and documentation on a 5-1/4" disk by sending $5.00 US to cover costs to the author, James Dow, 572 McGill Dr., Rochester, MI, 48309 USA

DIGSITE: COMPUTER SIMULATION OF AN ARCHAEOLOGICAL EXCAVATION

George Rapp, Jr. and Susan Mulholland

Univ. of Minnesota-Duluth

DIGSITE is the software designed to accompany Simulated Excavation, exercise 7 of Laboratory Exercises for Introduction to Archaeology (Rapp, Mulholland, Gifford, and Aschenbrenner, 1984). The program provides a realistic interactive simulation of a "dig" at Ali Kosh, a tell in Iran. The large data base contains features and artifacts buried at various locations and depths. The "site" is based on the real site of Ali Kosh; data was simplified from the site report by Hole, Flannery, and Neely (Prehistory and Human Ecology of the Deh Luran Plain, Memoirs of the Museum of Anthropology, University of Michigan, No. 1).

DIGSITE is designed for use on the IBM PC equipped with two disk drives, 256K memory and a printer. DIGSITE consists of two floppy disks, one for the program and one for the data base. The program disk must be initiated by your IBM DOS disk (version 2.0 or later). The printer is not necessary to run the program but is essential to record data from the excavated units. No graphics are produced by the program at this date. A demonstration disk, DIGDEMO, is available for $5.

Simulated Excavation is designed to allow students to plan and carry out an excavation without actually having to excavate. It is not intended to replace field training; nothing can teach how to excavate like being out in the field. Instead, this exercise tries to emphasize strategy and analysis. Students can try their hand at planning an excavation in response to a particular research problem (chronology, diet, or population size). Analysis of the materials found will indicate what can be done in the way of interpretation. DIGSITE is not the primary focus of this exercise; it is merely a tool to allow information retrieval between the more important steps of planning excavation strategy and analysis of the artifacts and features. Simulated Excavation has been taught at the freshman college level with good results but could be used with other grades. For more information, write to the authors at the Archaeometry Laboratory, University of Minnesota-Duluth, 10 University Drive, Duluth, MN 55812.


HOW TO SUBSCRIBE TO CAAN

Subscription requests should be sent to the Computer Assisted Anthropology News at any of the addresses which follow. The four issues of Volume 2 (1986-1987) are $6.00 US for individuals and $12.00 US for institutions.. Back issues of Vol. 1. (Nos. 1, 2, 3, and 4) and Vol. 2 (No. 1, and 2) are available at $1.50 US apiece. All of Vol. 1 is available on 5 1/4 disk for $10.00 for a MS-DOS disk and $15.00 for other formats. We would like to be able to produce and send you CAAN without charge, but in order to have it survive there seems to be no other means than to make it self-supporting.

HOW TO SUBMIT MATERIALS TO CAAN

Material should be submitted to James Dow, Editor, CAAN, at any of the addresses which follow. CAAN has a magazine style in which news, letters, notes, and articles are welcome. We prefer material submitted electronically, on diskettes or via network mail. Most 5 1/4 disk formats can be read. To avoid the problems created by the ever increasing number of word processor formats, 5 1/4 disk text should be in plain ASCII format with CR, LF, and TAB as the only control characters. Do not use any printer controls in the text. Tables should contain one and only one tab character per interior column. Text on 3 1/2 disks should be in Mac Write format.

CAAN ADDRESSES

US MAIL: Dept. of Soc. and Anth., Oakland University, Rochester, Mich. 48309-4401, USA TELEPHONE: (313) 370-2430, 375-1258 FIDOMAIL: James Dow, Fido #5 Net 120 INTERNET(ARPA): James_Dow@UM.CC.UMICH.EDU BITNET: JDOW@WAYNEST1 COMPUSERVE: 70150,266 UUCP: dow@cideq3.cidnet.com.