Computer-Assisted Anthropology News

Edited by James Dow

Vol. 2, No. 4, November 1987


CONTENTS OF THIS ISSUE

SOME TOOLS FOR THE MANAGEMENT AND ANALYSIS OF TEXT A PEDAGOGIC REVIEW
NEEDLE IN A HAYSTACK: A REVIEW
COMPUTER CATALOGUING AN ARCHAEOLOGICAL MUSEUM
COMPUTER INDEX OF CLASSICAL ICONOGRAPHY NEW PHONE NUMBER FOR ABBS HOW TO GET NEEDLE IN A HAYSTACK SECOND ANNUAL SUMMER INSTITUTE ON RESEARCH METHODS IN CULTURAL ANTHROPOLOGY INFORMATION NEEDED ON ANTHROPOLOGY DEPARTMENTS WITH COMPUTER CONCENTRATIONS CALL FOR PAPERS: DIRECTIONS AND IMPLICATIONS OF ADVANCED COMPUTING HOW TO DEBUG A FIELD COMPUTER EDITORIAL POLICY CONTENTS OF PAST ISSUES


SOME TOOLS FOR THE MANAGEMENT AND ANALYSIS OF TEXT A PEDAGOGIC REVIEW

John J. Wood

Department of Anthropology, Northern Arizona University

Introduction

The impetus for this paper was the interest in using computers for the management and analysis of ethnographic data shown in a session on "Using Computer Packages in Applied Anthropology Projects" held at the 1987 Annual Meeting of the Society for Applied Anthropology. With one exception, the papers, including my own, emphasized quantitative uses of computers in anthropology. Yet, if memory serves, the appeal of using computers to assist ethnographic research with qualitative data stimulated a disproportionately large number of questions. The questions were about specific products and procedures, and they seemed to indicate less familiarity with these than with the quantitatively-oriented programs and procedures discussed in most of the papers. Hence, the idea for this review arose.

Four products were mentioned in the discussions: THE ETHNOGRAPH, Notebook II, askSam, and Needle-In-A-Haystack. In addition, James Dow, our discussant, told me about a program he had recently developed, called Sort Blocks by Fields. I decided to concentrate on these programs, since they were mentioned, and since all are reasonably priced. It turns out that all are very helpful too. Two additional programs, Concord and KWIC, were subsequently brought to my attention by colleagues James Dow and Robert Trotter and included in the review; they complement the other programs nicely, and they are in the public domain. All of the programs run on IBM compatible microcomputers.

I am interested in how computer programs can help in anthropological research, as, I assume, are most of the readers of CAAN; therefore, the focus of the review is on the use of the programs in the process of research rather than on the programs themselves as products. I transcribed a segment of an "unstructured interview" (Bernard and others 1986) from my own research to evaluate and illustrate the use of the programs, because these kinds of data are very common in ethnography, and because the best way to learn about the strengths and limitations of computer programs for research is to use them with your own data. Discussion of the programs, using these data for illustration, is preceded by a synopsis of definitions and personal perceptions of issues in the management and analysis of text.

(Figure 1) (Figure 2)

Definitions

Using computers to assist in managing and analyzing text has been described as "word crunching" (Dennis 1984) after the familiar appellation for quantitative work, "number crunching." I am using text loosely here, as "words as written, printed, or stored electronically." This includes oral discourse, and observations when transcribed. The description of structures and explication of meanings of words in text and in context are ethnography's basic building blocks. An organization of research information into chunks creates a "database." For computer work, the most inclusive chunks are usually called files. Files, in turn, are divided into records. Most database management tools are designed to manage hierarchical structures, so records may contain smaller chunks of information called fields. Database management tools generally include ways of storing, searching, sorting, concatenating, reformatting, displaying, and transmitting files, records, fields and words. The ethnographer may use these procedures to assist in analysis as well.

The interview segment illustrated in Figure 1 was transferred to computer storage as a file. The file may contain just one record, the file itself, or it may contain several records, such as question-answer segments, or major subject clusters. Records may include fields, like the text following "Interp:" (for interpreter), or an identification field, such as the text following "ID:", which lists a tape catalog entry to search for specific information on the context of and participants in the interview.

Issues in the Management and Analysis of Text

The use of computers to assist in the management and analysis of text raises several important issues, conceptual as well as pragmatic: some are perennial research issues, others are points of theoretical contention, and still others arise from the constraints of specific tools.

The most important conceptual and theoretical matters for reflection are coding and database design, and the research process itself.

Coding and Database Design

Coding is a theoretical act and it begins with transcription. Speech and observation cannot be reproduced exactly as text. Some conventional selection and classification must take place. "Verbatim" interviews, for example, are often transcribed as normative discourse, without indicating emphasis, intonation, hesitation, interruption, pauses, and countless other discourse features (Mishler 1984, 1986). Several discourse features are encoded in the transcription in Figure 1 for illustration; for instance, dots bordered by spaces indicate relative length of pauses, colons are used when words are drawn out, and parentheses enclose inferred or unclear utterances. The relative emphasis on how things are said and what is said, depends upon the purposes of the transcription and one's theoretical orientation (see Tyler and Tyler 1986).

The next coding decision is the structure of the database. Text is often transcribed in chunks that make useful and meaningful file structures: an interview session, an observation period, answers to open-ended questions in questionnaires, a day's journal entry, a folktale, a myth, a printed document, are a few typical samples. Every piece of text from a project could be entered in a file, assuming that there is sufficient storage available, but direct management and analysis of large files is pragmatically difficult and cumbersome, and "putting everything in one pot" suggests a rather uncritical and unreflective view of one's data.

A file may or may not be coded into records. An analyst interested in the social construction of meaning in, say, an interview might prefer not to divide the file prematurely, preserving the context and structure of the interview (Mishler 1984). In the interview segment illustrated in Figure 1, there is what Mishler (1984) calls a "discourse cycle" of the informant's explanation of a particular medicine, the interpreter's translation, the anthropologist's questions, the interpreter's answer, and further explanation by the interpreter and informant. This cycle is clear from the segment as a whole: it might be more difficult to discover if the segment were divided, uncritically, into records.

If the research interests were in the medicines themselves, in this example, the cycles illustrated could serve as boundaries of meaningful records. The first cycle deals with bitter medicine, and the second is a discussion of salt. In the case of more structured interviews or in questionnaires, question and answer frames are good candidates for record boundaries.

Field codes may also help structure files and records to facilitate research. They can provide context such as characteristics of actors, activities, and places; opportunities for comment, observation, and interpretation; and conceptual classifications of content. Field code design, like the design of any coding system, should be a response to the question: "How must I query and retrieve information from my database to answer my research questions, and what should be there to help me interpret my data?"

The Research Process

In a foundationalist world view, research is reconstructed as a linear process (Rescher 1979). Theory is the foundation for generation of hypotheses, which are tested with objective data. Data are decontextualized: "Scientific thinking dictates that models be unambiguous and replicative, requirements which have been met by stripping natural models of extraneous elements, and by reducing to a minimum the number of assumptions inherent in them; that is, by a process of further simplification and abstraction" (Dyke 1981:201-202). In work with text, this is accomplished by coding, as in a typical social survey, or by isolating words, as in early work in content analysis.

Ethnographic research, as it is usually practiced, is a process of engagement with data in an iterative cycle of construction, analysis, interpretation, and synthesis, reflecting a "coherentist" world view, to use Rescher's (1979) term. Codes may have to be revised since they are, in reality, hypotheses. Meaning is meaning in context.

Computer-assisted management and analysis of text can go either way, and, again, how the research process is viewed influences the choice.

Constraints of Computer Software and Hardware

In the best of possible worlds, theoretical and conceptual concerns should determine how we do research, but there are always constraints. Some of the most important constraints in computer-assisted research are limitations of available hardware configurations, and software design. Most programs have a limit on record size, the number of fields permissible, and the arrangement of records. And every computer has a limit on storage, number of files that may be open at once, and speed. These are important considerations in choosing particular tools, as they limit research alternatives as much as our theoretical and conceptual baggage.

The Tools

Word Processors

Many of the tools reviewed assume that files already exist on some electronic medium. A word processor is an essential tool since it is the primary means of entering text and creating files. Also, much of the work of management and analysis of text can be done with word processors (Bernard and Evans 1983). Most full-featured word processors are capable of simple searches and sorts, concatenation, "cutting-and-pasting," editing, reformatting, and working with the same or different text in separate "windows." Many have the capability of automating procedures by playing back long sequences of commands with a few keystrokes (called "macros") or programming sequences of commands (Gillespie 1986). There are also several useful "macro" packages that work with word processors and other programs. Finally, a word processor can serve as the editing and reformatting bridge between programs with different data format requirements.

For all of these reasons, I recommend a word processor for initial data entry and for basic management and analysis. Database management programs usually have limited word processing capabilities, and most can import and export files; however, none can match the utility of a full-featured word processor. The basic word processor requirement is the ability to create and store files in ASCII format without page breaks.

Indexing Programs

Indexing programs search files and create a list of words, or character strings, with their locations. One of the earliest uses of computers for text analysis was the creation of concordances of narratives, using indexing programs. Later, under the rubric of content analysis, word frequency counts were added to the analysis of text. Content analysis today ."..covers a wide assortment of approaches and techniques, from word counts to complex examination of themes, from totally automated systems to those that produce simple word sorts (Wood 1980:284), but indexing still plays an important role.

Ethnographic analysis that is lexically based, such as the methodologies of Spradley (1979,1980) and Werner and Schoepfle (1987), can also make good use of indexing programs (also see Werner 1982), especially when used in tandem with a word processor. Nevertheless, there are some qualifications.

Agar expresses the major qualification very concisely: "...the problem is that people seldom cooperate by expressing their ideas in a small number of linguistic forms" (1983:22). Ethnoscience methodologies answer this problem, of course, by constructing question frames that tend to elicit a small number of linguistic forms that are contextually grounded in semantic space, taxonomies for example. But the point is: There are many ways of expressing concepts in text, and a simple index may be too coarsely-grained to catch some of them.

Three indexing programs are reviewed here: Concord, KWIC, and Needle in a Haystack. The first two are in the public domain and the third is a commercial product. Concord and KWIC work with standard ASCII files, and Needle in a Haystack can work with standard ASCII or WordStar (c) formats.

Concord

Concord was written by Walter Mebane and Bernard Tiffany; the version I have is dated 1986. It comes with a clearly written documentation file with examples of the many options available, and an executable program file. The options are invoked on the "command line"; that is, following the name of the executable file, CONCORD.

Concord produces a sorted single level index, which may be directed to the screen, a printer, or a file (Figure 2b). The index can include only words and numbers, or any string of characters bounded by blanks. It can optionally consider the case of the characters when constructing the index, include word frequency counts, write line indexes for each word, use line numbers included in the input file, or use text as "line numbers". A very useful feature is the option of creating a file of words to exclude from the index, or creating a file of words to include in the index.

The program is easy to use, fast, and reliable. The ability to include any string of characters bounded by blanks is especially useful in indexing encoded discourse features. There are no program limits in the documentation, but I would guess that the program allocates memory dynamically, and that 256K would probably be more than adequate for most files (the number of different words in a body of text almost never exceeds 20,000).

KWIC

KWIC stands for Key-Word-In-Context. A KWIC index is a sorted index of words that includes some of the text following and preceding the word (Figure 2a). Its obvious advantage over a sorted list of words is that it brings along some of the context, which provides information about meaning and syntax (Wood 1984; Eguchi 1987).

The KWIC package reviewed here was written by Lee Sailer. It consists of two executable files which must be used with an intervening sort program. The first "rotates" all of the words so that they can be sorted, and the second "unrotates" them back in context. Options include printing the file name and line numbers and setting the page width. Options are specified on the command line, as in "Concord." It uses the standard ASCII collating sequence, which is case sensitive, and key words are made up of letters or numbers and embedded punctuation; it does not key any string of characters separated by blanks.

KWIC is somewhat more difficult to use than Concord. The documentation is sparse, and using the program requires knowledge of MS-DOS input-output redirection, and pipes if sort is used. James Dow's "Sort Blocks by Fields" program (discussed under Database Managers) is easy to use with KWIC, does not require piping, works extremely well with KWIC rotated output, and it does not have the memory limitations of MS-DOS sort.

Needle in a Haystack (Version 1.33)
Needle is conceptually somewhere in-between Concord and KWIC. It creates a two-level index by considering "significant" leading and trailing words (words other than articles, prepositions, and conjunctions) (Figure 2c). It is described as a "relational" indexing program, because it can index up to 10 different files in the same operation.

The program requires a color graphics card, at least 256K, and two floppy drives, or one floppy drive and one hard disk. A hard disk is highly recommended, since its sorting, merging, and indexing are disk intensive.

Needle is case insensitive, works with letters or numbers only, and filters out articles, conjunctions, and prepositions. It can index by sentence, paragraph, or page, and it produces a reference file that relates all three. It does not index by line number, but it creates a file with page, paragraph, and page number clearly indicated so that finding a reference is relatively easy. Like KWIC it does not have the option of using include or exclude files; however, an alphabetic search range can be specified.

Optionally, it creates "Regular" or "Compact" indexes. The latter includes only those first-level entries with at least two occurrences of a second-level entry. This, and the search range option are helpful analysis features. Needle is extremely easy to use, and the user's manual is helpful and comprehensive. The program is menu-driven, and largely self-explanatory, especially when used in conjunction with the memory-resident "preliminary instructions" and the context-sensitive help feature. It seems to work well, although I did not push it to the limits, and it is a serviceable word crunching tool.

There are two irritating features and at least one bug. I would like to turn off the sound effects, and when the program asks for the name of the "numbered" file to index, it expects a standard file extension (.num) which should be appended by the program, rather than having the user remember it. The bug appears when the Compact index option is chosen, and there are no second level entries that occur at least twice; the program loops, endlessly I presume, until you break out or reboot your computer.

Database Managers

Database managers are programs that manage data that are coded in records and fields, by searching, sorting and reporting, and, in some cases, tallying. They are invaluable tools at that stage in the research process when the researcher is ready to classify concepts in the data and to begin to collect instances and juxtapose units of meaning.

The four packages reviewed are designed to facilitate text management (not all database managers have this capability); records and code fields may be easily added and changed in each; and all can work together and with the indexing programs, with a word processor bridge. These features, and others, encourage a rich, iterative and contextual approach to data analysis.

The differences among the four packages are in their conceptual design and in some specific features and capabilities. They range from structured to unstructured, from menu-driven to command-driven, and from simple search-and-rearrangement features to the ability to generate elaborate queries and reports.

THE ETHNOGRAPH (Version 2.0, and Version 3.0 (test copy))
THE ETHNOGRAPH is a relatively free-form and extremely flexible database manager, which automates the paper-and- pencil, cut-and-paste, manual approach to the analysis of text, and is somewhat phenomenological in concept (Seidel and Clark 1984; Brent 1984). Version 2.0 is fully operational, but Version 3.0, at this time, is still being tested, and it lacks one feature, facesheet codes, promised in the upgrade.

Data or records in THE ETHNOGRAPH are the same as files and have to be prepared with a word processor in a specific, but unconstraining, ASCII format. The data in Figure 1 were prepared in THE ETHNOGRAPH format (except in Figure 1, the text is placed side-by-side). "Speaker identifications" and "contextual comments" (prefaced by a +) begin in column 1, and the rest of the text must be offset at least one column. The text must not extend beyond column 40. Speaker identifications and contextual comments provide helpful context since they remain with their included segments when the segments are coded and collated. The space from column 40 on is for numbering lines and coding segments of text.

After the text is prepared, the next step is to use THE ETHNOGRAPH to number the data file(s). Version 2.0 can work with up to 40 files, and Version 3.0 can access 80 for numbering, searching, modifying, and printing. Version 3.0 also can operate with sets of files by creating catalogs. The numbered file is then printed to the screen, or printer (Version 2.0), or a file (Version 3.0) for viewing, editing, or coding.

Coding consists of associating concepts with contiguous segments (lines) of text (See Figure 3). Segments may be overlapping or nested, up to a maximum of 7 levels, and up to 12 code words of no more than 10 characters each can define a segment. These "code fields" or segments and their associated words are entered using THE ETHNOGRAPH, and the coded file is stored for searching and reference (Figure 4a). Coding schemes can be easily modified, with some global features.

(Figure 3)

Searching for coded segments using THE ETHNOGRAPH is the last step. As mentioned, a large number of files can be searched in the same operation. THE ETHNOGRAPH permits simple Boolean "and" and "not" searches for multiple codes, and the search results may be displayed on the screen, or copied to a printer or a file (Version 3.0). In addition to displaying the coded segments, the search procedure includes contextual information such as the file, speaker identification, contextual comments, the relationships of the segment to other overlapping or nested codes (Figure 4b).

(Figure 4)

THE ETHNOGRAPH is disk intensive, rather than memory intensive, so a hard disk is recommended. The maximum file size is 9999 lines, although the practical limits are much less if used with floppy disks. It is menu-driven and self- documented, so that it is quite easy to learn and use. There are no help screens, but the program traps errors well and presents clear prompts. The manual is very comprehensive and well-illustrated.

THE ETHNOGRAPH is a powerful, if somewhat unorthodox, database manager for text. It has some strong points in its favor: the coding scheme is flexible; several files can be processed in the same operation; procedures stress the importance of context; and treating the entire file as a unit facilitates the discovery of overall structure in the text. Records can be defined implicitly by coding meaningful segments, such as the bitter medicine and rock salt segments in the example. Text cannot be searched using THE ETHNOGRAPH; however, since files are in ASCII format, they can be searched or indexed with other programs.

AskSam (Version 3.11)

AskSam is an acronym for "Access Stored Knowledge via Symbolic Access Method." It is more familiar in concept than THE ETHNOGRAPH, in that it processes records, "documents," and fields of data. Within that genre of database managers, it is one of the most flexible and powerful. The basic unit in askSam is the record, which cannot be larger than twenty lines of 80 characters. Any number of records may be linked together to form logical documents for update and search, so the record size is not really a limitation. Records (and documents) in the same file may be different lengths, and the size of a file is limited only by the size of the disk.

Fields have no fixed location within a record and can be in a different order in different records. Fields are of three kinds: implied, explicit, or contextual. An explicit field associates a name with a string of text. Words with unique initial characters can be used as implied fields ($, for example), and any word can define a contextual field. Fields may even have the same name within the same record. Anything can be changed with askSam's screen- oriented text editor.

Searching and collating are contextual, so there are very few constraints on data structure, as noted, and databases created for use with askSam can be very nearly "free-form." This is an ideal feature for the management and analysis of text, since text comes in many different forms.

Records and documents can be created within askSam, using its text editor and optional key templates ("macros"), or ASCII files can be imported. The example data was structured for importation by using a word processor to insert a character between segments that I wished to define as documents, nothing else was changed (Figure 5).

(Figure 5) (Figure 6)

The highest level operations or procedures are accessed from a main menu, and each of these operations has a screen with a menu, and usually a command line. Figure 5, for instance, shows the DOCUMENT QUERY screen.

Query and retrieval features are robust in askSam. There are a variety of request types (including arithmetic requests) implemented on the command line using a simple but potent language. Requests may use wildcards (sal* will find salt, sale, but not saw) and the Boolean operators "and," "or," and "not." AskSam's "vicinity" command selects records with words or strings that are within a stated number of words, lines, or sentences from each other. The vicinity command used with Boolean operators is a sophisticated text processing feature. Foreign characters are entered with the usual alternate key plus number code.

Requests optionally are directed to the screen, printer, or to a file. When directed to the screen, search results are highlighted. If a request is used repeatedly, it can be stored as a program record and executed by reference to the program name. To demonstrate, I used askSam to manage and print the bibliography at the end of this article. There are three types of records, with different structures: books, articles in collections, and journal articles. The print program for this database uses askSam's "if/else" structure to decide what format to use for each of the three types of records.

The variety of field types, the free-form structure, and the contextual nature of search and retrieval, open up the possibilities for coding text segments. In the example shown in Figure 5, for instance, codes could be entered to the right of the appropriate segments, interactively in "Update" mode. Or a key word field or fields could be placed anywhere in the record or document. Function key templates make it easy to insert standard information, should that be of interest.

Bernard and Evans (1983) suggest that good use for database managers is as an index to field notes. The field notes are typed with a word processor, and codes and page numbers become the keys to be managed, sorted, and retrieved in the database program. It is a waste of resources to use askSam in that way, but I want to use this simple task to show the program's power and flexibility.

Figure 6 shows a RECORD QUERY screen with two records (actually they are permutations of the same record for illustration purposes), and the results of a request (shown on the command line). File and Page are explicit fields, and the codes, such as bittermed, are contextual fields followed by segment line numbers. The request asks for records that mention both bittermed and witch* (wildcard) and to display the file, page number, and what comes after bittermed and witch* (the line numbers). Note that Witchcraft is used twice in the first record.

The vital functions of askSam are memory resident and memory intensive, and, consequently, fast, but the program requires only 128K of memory. It operates very well with floppy disks. The manual is superb, both in content and layout. Help is available as an askSam database, to be queried like any other database. There are limitations. AskSam is not a relational database so it can work with only one file at a time, and it lacks the convenience of indexing for working with different views of one's data, but the program's craftsmanship, flexibility, and power recommend it highly as a tool for the management and analysis of text.

Notebook II (Version 2.31)

Notebook II is the most structured database manager reviewed, and, like, askSam, it processes data records and fields; but unlike many other structured database managers, it can process large text files. Notebook is almost completely menu-driven, and each menu has a context-sensitive help screen. Consequently, the program is easy to learn and use, especially for persons new to computers, and for infrequent users.

Each record can hold from 20 to 50,000 characters, depending upon available memory, and the number of records is limited by disk size. Foreign characters have to be assigned by number to alternate plus function key sequences using Notebook's "Utilities" module. Each record may contain up to 50 fields of any size as long as the total number of characters in the record does not exceed the maximum. Fields and records expand as text is entered, fields may be rearranged (globally), and new fields are easy to add.

Record templates are established for each database before data are entered, using the program's full-featured, screen-oriented editor, or imported. The record structure must be exactly the same for each record. Figure 7 shows Notebook's Edit screen view of an example record. Note that each field is identified by a heading to the left of the vertical line separating the headings from the field contents. These headings have to be the same for each record, although the length of each field can vary.

(Figure 7) (Figure 8) (Figure 9)

In order to import ASCII files, such as the example in Figure 8, specific record delimiters and at least one field heading, which must match an existing database structure, have to be added to every record with a word processor. Since a colon terminates a field heading in Notebook, the punctuation of the speaker identifications had to be changed to another character.

Querying and reorganizing the database are done from menus. Records can be reordered (sorted) on the first 20 characters of any field. Queries may be divided into two kinds: The first is a simple literal search for text. The second selects records with specified characteristics and creates a "view" (actually an index) of the database which may be "viewed" from the program, merged with other views, or used to create another database. The "Select" menu does not permit wildcard selection, but its selection criteria are extensive, flexible, and include Boolean "and" and "or." The "View" feature is especially helpful in researching a database. Notebook, like askSam, can only work with one file at a time.

Notebook has no macro feature or language for automating queries, everything is done from menus, which is a strength and a weakness. It is a strength because menus are easier to use and remember, and a weakness because a query language can be more flexible and powerful. Notebook does have a good report writer for automating that function.

Notebook has another useful feature for managing and analyzing text, a simple indexing procedure. The Key function in the Update module creates, displays, or prints an alphabetized list of key words and their frequencies from any field. A special exclude file of up to 100 words, can be created with a word processor.

Notebook II has a well-indexed manual with a tutorial and good reference sections. The program requires 256K, and is rather disk-intensive. A hard disk is not required, but it helps in speed and disk capacity. Notebook works with two other programs from the same company, Bibliography and Convert. Both are described in the Notebook II manual. Bibliography is designed for helping with bibliographies and citations, and Convert for converting on-line database formats to Notebook compatible files. Bibliography could be used with a word processor to relate text in a file to a Notebook "dictionary," and be applicable to text analysis in a number of ways.

Notebook II is an excellent package for working with text. I find its rather tight structure and menu orientation confining, but others may see these as strengths.

Sort Blocks by Fields (Version 1/1/87)

Sort Blocks by Fields, called SBF by its author, James Dow, is a general purpose sort program that can be used with a word processor to manage a "do-it-yourself" database, to use Jim's apt phrase. SBF basically rearranges blocks of text or other data of any size, and allows the user to specify the characters that define the blocks to rearrange and the "fields" to compare. Figure 9 shows how to use SBF to put the "block" in Figure 8 that discusses salt at the beginning of a file. I mentioned using SBF with the KWIC indexing program earlier.

SBF uses the standard ASCII collating sequence for comparison. Options are entered on the command line or following prompts, as in Figure 9.

SBF is fast, and can work with large files since it uses a sort/merge algorithm. The sort/merge algorithm uses temporary files, so a hard disk may be necessary for very large files. SBF is easy to use, and it comes with good, concise documentation on disk. I would encourage anyone who uses a computer, and especially someone who works with text, to include this versatile program in their toolkit.

Conclusions

The programs mentioned by the participants whose comments instigated this review, and those provided by my colleagues James Dow and Robert Trotter, are, without exception, good and useful tools for the management and analysis of text. I have tried to summarize some of the issues in managing and analyzing text, and describe and evaluate the programs in that light. I have preferences for certain programs and features, which I have indicated, sometimes explicitly and sometimes implicitly, but the choices between, say, THE ETHNOGRAPH, askSam, and Notebook II, are matters of personal choice and the project at hand.

The use of computers in anthropological research from field work to analysis seems to be well-entrenched in the discipline, especially using the kinds of tools reviewed here. Three of the programs reviewed were actually written by anthropologists: KWIC by Lee Sailer; Needle in a Haystack by Loren Pahlke, and SBF by James Dow.

But, to quote Gerson (1986), "Where Do We Go From Here?" Gerson thinks, and I agree, that we need more concern with analytical strategies in software development. This development will have to come from persons who are familiar with the conduct of qualitative research, such as anthropologists and sociologists. Sailer, Pahlke, and Dow, and probably others whom I do not know about, are good examples of the productive combination of programmer and anthropologist, and the useful, rather unorthodox features of THE ETHNOGRAPH are due to the research knowledge of its social science developers.

The public domain programming language "Icon" is designed specifically for work with text, and for writing quick and versatile utility programs (Griswold and Griswold 1983). It builds upon the strengths and avoids some of the weakness of its predecessor, SNOBOL4, which linguistically oriented readers will recognize, as may others. Wider familiarity with this tool for non-numeric programming would facilitate software development for text analysis. Finally, burgeoning research in Artificial Intelligence and Knowledge Structures is bound to bear fruit for computer-assisted analysis of text. Brent (1984, 1986) shows how some of this research may be applied in qualitative data analysis. For a simple and comprehensive introduction to this exciting field, see Frenzel's (1987) "Crash Course in Artificial Intelligence and Expert Systems."

Acknowledgements

I wish to thank Seaside Software, Pro/Tem Software, Inc., and Aurora Software for providing review copies of, respectively, askSam, Notebook II, and Needle in a Haystack. (Even if Loren Pahlke said I could not use Needle in a Haystack for anything else without paying for it). I already owned THE ETHNOGRAPH. Also thanks to Jim Dow and Bob Trotter for their advice and help.

Bibliography

Agar, M. 1983 Microcomputers as Field Tools. Computers and the Humanities 17:19-26.

Becker, H. S. 1986 Teaching Fieldwork with Computers: Computers in Qualitative Sociology. Qualitative Sociology 9(1):100-103.

Bernard, H. R. and M. J. Evans 1983 New Microcomputer Techniques for Anthropologists. Human Organization 42:182-185.

Bernard, H. R., P. J. Pelto, O. Werner, J. Boster, A. K. Romney, 1986 The Construction of Primary Data in Cultural Anthropology. Current Anthropology 27(4):382-396.

Brent, E. 1984 Qualitative Computing: Approaches and Issues. Qualitative Sociology 7(1-2):34-60.

1986 Knowledge-Based Systems: A Qualitative Formalism. Qualitative Sociology 9(3):256-282.

Choueka, Y. 1980 Computerized Full-text Retrieval Systems and Research in the Humanities. Computers and the Humanities 14:153-169.

Conrad, P. and S. Reinharz, eds. 1984 Computers and Qualitative Data. Qualitative Sociology (special issue) 9(1-2).

Dennis, D. L. 1984 "Word Crunching": An Annotated Bibliography on Computers and Qualitative Data Analysis. Qualitative Sociology 7(1-2):148-156.

Drass, K. A. 1980 The Analysis of Qualitative Data: A Computer Program. Urban Life 9:332-353.

Dyke, B. 1981 Computer Simulation in Anthropology. In B. J. Siegel, A. R. Beals, S. A. Tyler, eds. Annual Review of Anthropology. Vol. 10. Pp. 193-207. Palo Alto, CA: Annual Reviews Inc.

Dyson-Hudson, R. and N. Dyson-Hudson 1986 Computers for Anthropological Fieldwork. Current Anthropology 25(5):530-531.

Eguchi, P. K. 1987 Fieldworker and Computer: An End User's View of Computer Ethnology. In Toward a Computer Ethnology. Senri Ethnological Studies No. 20. J. Raben, S. Sugita, M. Kubo. Pp. 165-174. Osaka, Japan: National Museum of Ethnology.

Frenzel, L. E., Jr. 1987 Crash Course in Artificial Intelligence and Expert Systems. Indianapolis, IN: Howard W. Sams & Co.

Gerson, E. M. 1986 Where Do We Go From Here? Qualitative Sociology 9(2):208-212.

Gillespie, G. W., Jr. 1986 Using Word Processor Macros for Computer-Assisted Qualitative Analysis. Qualitative Sociology 9(3):283-292.

Griswold, R. E. and M. T. Griswold 1983 The Icon Programming Language. Englewood Cliffs, NJ: Prentice-Hall.

Guillet, D. 1985 Microcomputers in Fieldwork and the Role of the Anthropologist. Human Organization 44(4):369-371.

Kirk, Rodney C. 1981 Microcomputers in Anthropological Research. Sociological Methods and Research 9:395-536.

Lipkin, J. and B. S. 1978 Data Base Development and Analysis for the Social Historian. Computers and the Humanities 12:113-125.

Mishler, E. G. 1984 The Discourse of Medicine: Dialectics of Medical Interviews. Norwood, NJ: Ablex Publishing Corporation.

1986 Research Interviewing: Context and Narrative. Cambridge: Harvard University Press.

Ogilvie, D. M., P. J. Stone, and E. F. Kelly 1982 Computer-aided Content Analysis. In A Handbook of Social Science Methods, Volume 2, Qualitative Methods. R. B. Smith and P. K. Manning, eds. Pp. 219-245. Cambridge, MA: Ballinger Publishing.

Podolefsky, A. and C. McCarty 1983 Topical Sorting: A Technique for Computer Assisted Qualitative Analysis. American Anthropologist 85(4):886-890.

Rescher, N. 1979 Cognitive Systematization: A Systems-Theoretic Approach to a Coherentist Theory of Knowledge. Totowa, NJ: Rowman and Littlefield.

Sailer, L. ed. 1984 Computer-Assisted Anthropology. Practicing Anthropology 6(2):5-27.

Seidel, J. V. and J. A. Clark 1984 THE ETHNOGRAPH: A Computer Program for the Analysis of Qualitative Data. Qualitative Sociology 7(1-2):110-125.

Spradley, J. P. 1979 The Ethnographic Interview. New York: Holt, Rinehart & Winston.

1980 Participant Observation. New York: Holt, Rinehart & Winston.

Sproull, L. and R. F. 1982 Managing and Analyzing Behavioral Records: Explorations in Non-numeric Data Analysis. Human Organization 41:283-290.

Tyler, M. G. and S. A. Tyler 1986 The Sorcerer's Apprentice: The Discourse of Training in Family Therapy. Cultural Anthropology 1(2):238-256.

Weinberg, D. 1974 Computers as a Research Tool. Human Organization 33:291-302.

Weinberg, D. and G. M. Weinberg 1972 Using a Computer in the Field: Kinship Information. Social Science Information 11:37-59.

Werner, O. 1982 Microcomputers in Cultural Anthropology: APL Programs for Qualitative Analysis. BYTE 7(7):250-280.

Werner, O. and G. M. Schoepfle 1987 Systematic Fieldwork. 2 vols. Beverly Hills, CA: Sage Publications.

Wood, M. 1980 Alternatives and Options in Computer Content Analysis. Social Science Research 9:273-286.

1984 Using Key-Word-in-Context Concordance Programs for Qualitative and Quantitative Social Research. Journal of Applied Behavioral Science 20(3):289-297.

Software Availability

Concord: KWIC: NAPA/NAU Bulletin Board NAPA/NAU Bulletin Board (602)-523-7473 (602)-523-7473

Needle in a Haystack: THE ETHNOGRAPH: Aurora Software Qualis Research Associates Drawer C 611 E. Nichols Drive 12591 Beachcomber Dr. Littleton, CO 80122 Anchorage, AK 99515 $150.00 $59.95

AskSam: Notebook II: Seaside Software Pro/Tem Software Inc. P.O. Box 31 814 Tolman Dr. Perry, FL 32347 Stanford, CA 94305 $200.00 Notebook II - $189.00 Bibliography - $75.00

Sort Blocks by Fields: Icon: James Dow Ralph E. Griswold 572 McGill Dr. Icon Project Rochester, MI 48309 Department of Computer Science $5.00 (to cover costs) University of Arizona Tucson, AZ 85721


NEEDLE IN A HAYSTACK: A REVIEW

Benjamin F. Crabtree, Ph.D.

Department of Family Medicine University of Connecticut

Pertti J. Pelto, Ph.D.

Department of Anthropology University of Connecticut

It is heartening to see that our colleagues with programming skills are producing more and more software for shaping up our field notes and other loose-jointed qualitative data files. Seidel and colleagues have Ethnograph, Sailer produced BBU, and there is Notebook II (cf. CAAN, vol.2.2, Sept. 1986). Needle in a Haystack, which runs on IBM and IBM compatible MS-Dos operating systems, is intended to help with our field note management through a two-level index building program. It is designed to take text files or any ASCII file and build an index of all words in relation to their surrounding words. This great searching power is meant to provide "aid and comfort to computer users who need to carefully research documents, transmitted modem files, or any ASCII files." (Page 1 of user's manual.)

The description goes on to note that "NEEDLE searches for every word, not just a select few." But there's the rub. Are we sure we really want an index that has every word and its immediate environment. Think a moment. In pages, such an index always runs longer than the original manuscript! In fact, the index for a four page manuscript runs around 8 pages. That includes a lot of redundancy, for example the phrase "the storm delayed testing services until Wednesday" would include in the index:

  testing delayed services        storm
delayed storm   delayed delayed
services        testing Wednesday ...etc.
Some archaeologists, some anthropologists, and who knows what other types of scholars, may need just such an exhaustive indexing (the creator, Loren Pahlke in Anchorage, Alaska, is an anthropologist). Operation of Needle involves 3 basic steps: (1) preparation of your file: numbering all the sentences, paragraphs and pages; (2) building the index: which can reference by the sentence, the paragraph, or the page; and (3) outputting the index: to the screen, a disk file, or to a printer. These three steps are basically simple enough, and the program is mainly menu-driven. However, the 8-page manual (and the menus) have some flaws that take a few minutes to ponder. When using floppy diskettes, neither menu nor manual tell you when to insert your diskette with the text file into the drive after the initial boot of the system. This is a bit irritating, because the operations require a lot of shuffling and shifting of floppies in the course of preparation and indexing. You shift diskettes about six times. We noted, however, that after running the program a number of times disk swapping became fairly routine. The manual also instructs you to format and label 3 blank diskettes needed in the operation (that's a total of 5 with the program and the text file). It says to label one diskette DATA, but the menu refers to that diskette as PAGES. There are a few other similar minor irritations for floppy users; with a hard disk these problems disappear, of course. To select which words get indexed, the menu allows one to select any first letter (A to Z) as the start, and any letter to finish. That means we could index H through M (if we wanted health, hospitals, measles, malaria and mumps (and everything in between)). Otherwise the program gives us no further options. One other output bears noting: the program prepares a "first line index" with a format like this:
  Page    Paragraph       First Line      Sentences inc.
1       1       Late afternoon in Kingston      1 to 1
1       2       I am reading Saul Be    2 to 13
.       .       ........        .....
etc.
2       8       I'm really beginning    51 to 53
2       9       But my style isn't total        54 to 61

This sort of index, from some of our rambling fieldnotes, might be a lot more useful than pages and pages of listing every word in our data file. The index of first lines for a four page manuscript ran slightly less than a page--with short sentences.

The manual's suggestion of indexing transmitted modem files may be useful. Most "Modem files" we deal with are "formatted" text files (in ASCII), and as such, we are able to index them with the program. Occasionally, however, we also get "unformatted" files created by word processing programs (mostly from mainframes) that could not be indexed unless somehow converted to WordStar or ASCII files.

We feel that the author(s) of Needle should work on some modifications to bring the program into a broader range of usefulness:

1. The manual is too brief, cryptic. We found it especially difficult in the matter of setting printer commands (when our printer was not a plain-vanilla, IBM-compatible machine). We ran the program on various IBM and IBM-compatible machines, including a 2 drive IBM-PC, an IBM-XT, a 2 drive Compaq, and a clone of an IBM-AT. The program ran well on all machines (although much easier with a hard drive), but caused considerable problems with printing. With an Epson letter-quality printer on the AT clone, after a half-an-hour of trying we were able to get the margins set correctly with a typed-in initialization string, but gave up on the complex details of getting the tabs set. Even on an IBM graphics dot matrix (and an IBM-PC) the tab settings did not always work correctly. Brief, succinct, how-to-do-it examples would make matters easier. It would be best, however, to have a printer configuration menu which would pick out the correct initialization string for common printers.

2. Each time the program was run it was necessary to tell the program whether the user was using a floppy drive or hard drive system. It should be fairly simple to have the program configure our floppy system or hard disk. Its annoying to have to indicate "floppies" each time we run the preparation sequence, although with a hard disk you can hit F2 to accept the default hard disk.

3. We ran Needle with WordStar, WordPerfect, IBM Writing Assistant, IBM Pie Writer (a PC program similar to Waterloo Script on mainframes), as well as standard ASCII files, written with a general editor (Kedit), and as output by word processing programs. The program functioned as advertised with WordStar and ASCII files. It also worked fairly well with IBM Writing Assistant, although control characters such as those used to indicate an underscore were interpreted (and indexed) as a nonsense string of letters. With WordPerfect and IBM Pie Writer, however, Needle could not distinguish paragraph breaks and would index the document as if it contained only one paragraph. This could be rectified somewhat by outputting ASCII files onto a disk (rather than sending them to the printer) and then running Needle on the formatted output file. To be more functional, however, the program should be expanded to work with WordPerfect and a few other common processors, and a list of such programs should be available in the manual.

4. The choice of a first letter alphabetic system for indexing should be made more flexible. The ideal would be to have the ability to enter a list of 30 or 40 words, and limit the indexing to those, on occasion; or, to select H and M only--because we wanted health, hospital, malaria, measles and mumps (instead of H through M as now designed). To run just H and M, one currently needs to make two runs of the "build" step (one for each beginning letter), or has to be content to index all words beginning with the selected range of letters. While making two runs is not all that great a problem, the file naming procedures make it necessary to complete one "build" all the way through to the printing process or to rename files (requiring a DOS command not available from within the program) before indexing the second letter. This procedure could be facilitated if it were possible to temporarily break out of the program to give necessary DOS commands.

5. For floppy-users it would be handy if we didn't have to manage quite so many switches of diskettes in the process.

At $59.95 (suggested retail price), Needle in a Haystack is not terribly expensive, but its usefulness seems limited, compared to Ethnograph and Notebook II, for example. On the other hand, certain specialized users may find the hyper-extensive indexing just the thing they need, especially if they currently use WordStar and have a hard disk.


COMPUTER CATALOGUING AN ARCHAEOLOGICAL MUSEUM

Ellen N. Barcel

Southold Indian Museum P.O. Box 268 Southold, N.Y. 11971

I am working with an archaeological museum on Long Island which has a large collection of primarily lithic material. Most of the collection has never been adequately catalogued. Work that has been done is inconsistent, depending primarily on the interests of the collector. Thus, some parts of the collection have extensive documentation while other parts have none at all, with the entire range in between represented.

The museum (The Southold Indian Museum, owned and operated by the Incorporated Long Island Chapter of the New York State Archaeological Association) recognized the need to fully catalogue their collection and believes that a computer would be extremely useful in collection management and possibly collection-based research. There have been many practical as well as theoretical problems which we have had to face in our cataloguing project.

The collection is an estimated 200,000 to 400,000 Indian artifacts. We began the cataloguing project in the summer of 1985 by inventorying the entire museum in a cursory way. The following year, the Board of Trustees (with help from Dr. Phil C. Weigand and Dr. Kent Lightfoot, Anthropology Dept., State University of New York at Stony Brook) developed a detailed cataloguing system which uses 27 fields of data.

We believe this system covers all important information on each object. Complicating the problem at this phase was the fact that while most material was archaeological in nature and from Long Island, we also owned some non-local archaeological material and some ethnographic material as well. We had to design our system keeping all these artifacts in mind.

At present, we plan to enter only key information for collection management into the computer due to the constraints of disk storage. The following is a listing of the fields covered by our cataloguing system. Starred items only will be utilized in the first phase of data entry.

  1.*     Catalogue Number
HISTORY:
2.      Donated by
3.      Date donated
4.*     Collected by
5.      Date collected
6.      How collected
7.*     Location in Museum
DESCRIPTION:
8.*     Item name
9.      Number of items per catalogue number
10.     Material
11.*    Condition/Preservation priority
MEASUREMENTS:
12-16.  Dimensions generally in cm., (5 fields have been allowed in order to deal with odd-shaped pieces such as pottery and to possibly include weight).
PROVENIENCE:
17.     Country
18.     State
19.     County
20.     USGS #
21.     Locality
22.     Site
23.     Feature
24.     Level
INTERPRETATION:
25.     Time period (Paleo, Archaic, etc.)
26.     Cultural Affiliation (Levanna point,
        Orient Focus, Sioux, etc.)
27.     Type of site

We felt that the above system covered the material that has extensive provenience, and, at the same time, fields could be left blank if data were unavailable or non-applicable. Although we are a museum and have to consider record keeping as our primary task at this initial phase of the project, we wished to include as much field data as possible without getting bogged down and losing sight of our first goal--the management of the collection. We would appreciate comments from readers regarding our choice of data.

We would prefer that the person entering data into the computer be able to type in whole words or phrases. This would allow a larger number of museum personnel access to the information--they would not need a code book or special instructions to locate information. However, the mass storage needs here would then be astronomical. Thus, we are left with using abbreviations (standard whenever possible and a code book to enter data and limiting the number of fields actually entered into the computer.

We have estimated that even if we code our data, we will need 25 meg of storage should the estimate of 400,000 artifacts prove correct and we choose to enter all fields for all items. This, of course, is one of our major problems. We are most grateful to Lynn Sullivan, the computer expert at Dr. Robert Funk's (New York State Archaeologist) office, for her help. She provided us with the answers to a number of technical questions. She pointed out, for example, that three times the expected data storage was needed to manipulate files. If we had 5 meg of data to be stored, we would need another 15 meg or so to manipulate these files, an important consideration in designing our system. Our 25 meg of data storage would then require an added 75 meg for file manipulation. This is far more disk storage than we can afford at the present time.

We are, therefore, considering setting up different files for discrete large collections which would allow us to enter extensive data for collections with documentation and abbreviated data for collections without documentation saving us disk storage space.

We considered the possibility of entering into the computer only the artifacts that have extensive data. A collection of unknown projectile points, donated at an unknown time, by an unknown person, from an unknown site, from a researcher's point of view might be almost useless; hence, have no need of computer storage. However, from the point of view of collection management, all artifacts should be at least listed in the computer with their museum location.

While we will be using the computer initially for collection management, we hope that our setup will be useful to researchers who wish to study the collection. The question of whether a system can be set up which will serve both purposes is one we have attempted to address, we hope successfully. We are aware of the fact that we cannot anticipate every researcher's requests. Someone doing specific research might wish to include additional fields of data, but we felt that our overall system was extensive enough to at least give all researchers a sizable start on their studies and the program we are considering, d BASE III, does allow for the addition of fields at a later date, should the need arise.

As far as hardware, we are contemplating using the following (although at this writing we have not actually purchased the computer): an IBM-PC-XT or compatible, of similar capabilities, and a hard disk drive, possibly one with interchangeable cartridges, such as the Bernoulli Box. The use of interchangeable cartridges would eliminate the need for a tape or floppy backup. However, if this is not economically feasible, then we would also need to purchase some type of backup. Also to be purchased would be a printer, probably a standard dot matrix.

While we could certainly use a bigger system, given the size of our collection, unfortunately, budgetary constraints prevent our purchasing same. The suggestion that we contract with someone owning a larger system was considered and rejected for a number of reasons. Data security was one consideration. The cost of renting computer time was another. With this, came the real concern that a computer service could raise their rates unexpectedly and drastically. The final consideration was the inconvenience of having our data stored off premises.

The question of whether to use an optical scanning device or a data entry clerk is another one with which we have wrestled. At present, we are leaning toward the data entry clerk for several reasons:

1) The optical scanning forms are a nuisance to fill out and the degree of accuracy in entering information on them tends to fall dramatically in proportion to the number of forms one must fill out at any given time.

2) We have no storage facilities for the enormous number of forms which would be needed, assuming one form for each artifact.

3) Due to the large number of fields needed for some collections, we would require large forms (8-l/2 by ll in.) necessitating a very expensive machine. Also, the forms would probably have to be specifically designed for our purposes, an added expense.

4) It is probable that some volunteer museum members can be trained to enter data.

Our next problem involved the paper records from which we would work. We decided to use a standard 5 x 8 in. index card format for the accession record of an individual collection. However, if we do not use optical scanning forms, what exactly would we use for a data entry person to work from? A ledger-type sheet with coded data was considered but rejected since several board members felt that having the information only in coded form initially was not a good idea.

An index card system (one card per artifact) was suggested but rejected due to bulk. Thousands of cards presented the same storage problem as thousands of "opscan" forms. Our solution was to develop a 5 x 8 in. paper (rather than card stock, to reduce bulk) form printed on both sides for maximum utility of space. On the left side of the form information is written out. There is a narrow column on the right side of the form to enter the information in coded form for the data entry clerk to use. This eliminates the need for duplication of effort and paper. Our one form can serve as a cataloguing worksheet, the present record of each artifact and the source from which a data entry clerk works. These data entry forms would be filled in by museum volunteers. A data entry clerk would then enter the information into the computer, and, finally, volunteers will check the printouts for data entry errors. This phase of the project is estimated to take between 5 and 10 years.

As discrete sections of the collection are fully catalogued and entered into the computer, we plan to have printouts of basic information. While we have not completely decided on the exact nature of these printouts, we hope at least to print out the entire collection by catalogue number and by location in the museum.

This past spring, we began the detailed cataloguing of our collection, using the previously described worksheets. We started this procedure with new accessions and a small sample of artifacts which had been in storage. During this summer, we will devote most of our time to continuing the actual work of cataloguing.

We have dealt with some difficult questions, both theoretical and practical. Frequently, the practical problems were the overriding consideration in making a decision. Complicating the entire matter, for example, is the sheet size of our collection. Solutions which would work for smaller collections are not always feasible for us.

We would be most grateful to hear from anyone who has addressed these issues (either successfully or unsuccessfully) in the past. Any suggestions or comments would be most appreciated.


COMPUTER INDEX OF CLASSICAL ICONOGRAPHY

Jocelyn Penny Small

From the time when writing was invented, the problems of storage and retrieval have loomed large. It matters not whether the media are clay tablets written in cuneiform, Greek rolls, Latin Codices, or 5-1/4 inch, double-sided, double-density diskettes in ASCII. The archive of the U.S. Center of the Lexicon Iconographicum Mythologiae Classicae (LIMC) in the Rutgers University Library is no exception. Each country, participating in the LIMC, has assumed responsibility for cataloging the relevant classical objects (800 B.C.-A.D. 400) in its public and private collections, and sends that information to the Central Editorial Office in Basel, Switzerland. The archives that have been developed as a result of this worldwide survey have become important in their own right as unique sources of primary information about antiquity for both the specialist and the layman.

Authority Lists For Ancient Art

The U.S. Center has currently recorded about 7,500 objects, a manageable size to use in its development of a computerized index to subjects illustrated on classical objects. The Center is extremely grateful to the National Endowment for the Humanities, Division of Research Programs-Research Tools, for a two-year pilot grant for computerization. The Center's system is particularly significant, because it does not limit itself to just titles of scenes like other projects, but analyzes the components, the figures, and their attributes, that make up the representations. At the same time the Center is working on establishing controlled vocabularies not only for the figures and the titles of scenes, but also for the information that places an object in its context, such as materials and techniques. These author lists, virtually non-existent for ancient art, are essential for consistent retrieval.

The choice of programs was simple. We found only one, Revelation. Although biblically rather than classically named, Revelation stores in plain ASCII, the most common format. It has an automatic transfer routine to import and export data, and all fields are variable in length with up to 65,000 characters per record. The number of records per file is limited by storage alone. Other features that proved absolutely essential are the multi-valued or repeating fields, which allow for hierarchically related information; the capability of changing any or all information that defines the fields within a file in any way whatsoever without destroying or reloading existing data; calculated or symbolic fields to do basic computation and a wide variety of other tasks, such as automatic dating of records or underlining in print-outs; and, finally, the possibility of an easy translation, in the conventional sense, into foreign languages.

Once the main software package had been selected, the other parts of the system were easier to determine. The U.S. Center is using an enhanced IBM PC/AT with a 20-megabyte hard disk, 512K RAM, and the 80287 math co-processor (which Revelation uses). A monochrome monitor and a high-quality, dot-matrix printer were purchased.

How the System Works

As an example of how the entire system works, consider an elaborately decorated vase with a unique representation in its main scene from South Italy and now in the Virginia Museum of Fine Arts, Richmond. The "manual" catalogue card has two types of information: fill-in-the-blank, primarily in the upper third of the card, and a long description of the three scenes in the middle.

The fill-in-the-blank data convert quite easily to a computerized system, since both the computer form and the card work in the same manner with one significant exception. Each bit of information on the card must be reduced to its least common denominator or most irreducible form, which is then allocated its own special slot or field in the computer record. An expansion of fields is inevitable during conversion from a manual to a computerized system.

More difficult and far more troublesome is the computerization of the description of the figured scenes. Simply entering the straight text of the description is not sufficient, because computer programs do not fully understand English. Obvious problems are indefinite antecedents or the beginning sentences in the description of the upper register of the Virginia vase where similarly portrayed figures are united in the narrative, though separated visually by a great distance. Accordingly, an abstraction of the essential information has to be made, and entered separately.

The Revelation program comes with a mechanism for recording the nub of this information: the multi-valued or repeating field. One field is allocated for all figures with each figure listed separately. The Virginia lekythos has twenty-two elements, as noted in brackets above Elements. In addition, a multi-valued field can be associated with other multi-valued fields, which can in turn have associated sub-values. Elements of the scene are directly linked to the three fields of Type, Dress, and Attributes with the result that, while all the male figures in the Rape of the Leucippidae wear mantles, Idas alone is correctly associated with the stele (or gravestone).

There are three core files: OBJECTS, with information that pertains to the object as a whole; SCENES, with information related only to individual scenes; and FIGURES, with the grid of Elements that describes the components of a scene. The three are integrated so that movement between them is automatic.

In addition to the core files, 30 satellite files perform a variety of tasks. These files check the spelling of the words entered. For example, the CULTURES file lists all the cultures, like Greek, Etruscan, or Roman, and is specified as the verification file for the field Culture in the file OBJECTS on Record1. It makes sure that Greek is spelled correctly, and that American, for example, cannot be entered. CULTURES is also used to check the same field in the satellite file for ARTISTS, which, in turn, is linked to the Artist field (Record2) in the core file, OBJECTS. In other words, satellite files are linked not just to the core files, but to each other. They can also verify each other and not the core files, as STATES checks the spelling for the state where each collection is located.

This description is just an overview of the U.S. Center's computer-index of classical iconography, its structure and components. Since January 1985 when the system arrived, the database being developed has evolved into a more complicated system than originally planned. Yet, not quite paradoxically, its very complexity is gradually making it easier to use, for all the information needed to catalogue an object accurately is available in one place and what is more, instantaneously. The whole story of iconography will soon be at our fingertips.

For further information contact: Jocelyn Penny Small, Director U.S. Center of the Lexicon Iconographicum Mythologiae Classicae Rutgers University New Brunswick NJ 08903


NEW PHONE NUMBER FOR ABBS

The exchange for the Anthropologist's Bulletin Board System (ABBS) has been changed by the phone company. The new phone number is (503) 464-3912 See CAAN Vol 2. No. 2. Page 25 for a description of ABBS and information on how to use it.

HOW TO GET NEEDLE IN A HAYSTACK

We regret that the last issue of CAAN did not include information on how to obtain a copy of Needle in a Haystack. Needle In a Haystack was written by anthropologist Loren Pahlke and retails for $59.95 from Aurora Software, Drawer A, 12591, Beachcomber, Anchorage, AK 99515.

SECOND ANNUAL SUMMER INSTITUTE ON RESEARCH METHODS IN CULTURAL ANTHROPOLOGY (Supported by the National Science Foundation)

June 10 to July 1, 1988 at the University of Florida

PURPOSES:

1. To disseminate state-of-the-art knowledge of field data-gathering techniques in cultural anthropology.

2. To develop syllabus materials and bibliographies; to facilitate broader dissemination of newest techniques of field research.

3. To encourage more widespread teaching of these methodological skills in programs in anthropology (and related fields).

CONTENTS:

1. Key informant interviewing and related qualitative data-gathering. Focus on the uses of open-ended interviewing for development of quantitative instruments.

2. Structured interviewing, including both survey interviews and specialized interviewing on cultural domains using triad sorts, rating/ranking, and other tools.

3. Principles of direct observation in natural settings.

4. Coding and organization of both qualitative and quantitative data. Data transfer to microcomputers. Data storage and analysis.

WHO SHOULD APPLY:

The institute is intended for Ph.D. anthropologists who are now teaching, or are likely to be teaching field research methods in graduate and undergraduate programs. Also, individuals in applied research with responsibility for gathering primary data in agriculture, health care, education, and other domains are encouraged to apply.

INSTRUCTORS:

H. Russell Bernard (Univ. of Florida), Pertti J. Pelto (Univ. of Connecticut).

STIPENDS AND EXPENSES:

Stipends are provided by the institute for participants' lodging, food, and instructional expenses. Participants are expected to arrange their own funding for travel to and from Gainesville, Florida.

CONCERNING USE OF MICROCOMPUTERS:

One feature of the training will be an introduction to a variety of microcomputer programs for use in ethnographic field work and data analysis. These include programs facilitating multidimensional scaling and clustering, and programs for managing and analyzing qualitative field notes. Applicants are urged to develop familiarity with word processing before the training begins. Participants with advanced knowledge of microcomputers and data analysis will benefit from special sessions.

FOR FURTHER INFORMATION CONTACT:

H. Russell Bernard Anthropology 1350 Turlington Hall University of Florida Gainesville, Florida 32611 (904) 392-2031 BITNET CY$EFH3@NERVM

INFORMATION NEEDED ON ANTHROPOLOGY DEPARTMENTS WITH COMPUTER CONCENTRATIONS

As part of their upcoming, edited volume, COMPUTER APPLICATIONS FOR ANTHROPOLOGISTS, Margaret Boone and John Wood are preparing and appendix which lists computer resources for anthropologists: electronic newsletters, bulletin boards, journals, and a list of training programs. Please send notices of Departments of Anthropology which offer concentrations, certificates, or specializations in computer applications (quantitative or qualitative), and notes on other computer resources for anthropologists to: Margaret S. Boone, 4501 Arlington Blvd. No. 727, Arlington, VA 22203-2747. (MSBUHT@GWUVM.BITNET)

CALL FOR PAPERS DIRECTIONS AND IMPLICATIONS OF ADVANCED COMPUTING

Nancy Leveson (nancy@murphy.uci.edu) DIAC-88 St. Paul, Minnesota August 21, 1988

The adoption of current computing technology, and of technologies that seem likely to emerge in the near future, will have a significant impact on the military, on financial affairs, on privacy and civil liberty, on the medical and educational professions, and on commerce and business. The aim of the symposium is to consider these influences in a social, economic, and political context as well as a technical one. The directions and implications of current computing technology, including artificial intelligence and other areas, make attempts to separate science and policy unrealistic. We therefore solicit papers that directly address the wide range of ethical and moral questions that lie at the intersection of science and policy.

Within this broad context, we request papers that address the following suggested topics. The scope of the topics includes, but is not limited to, the sub-topics listed.

RESEARCH DIRECTIONS: Ethical Issues in Computing Research Sources and Effects of Research Funding Responsible Software Development

DEFENSE APPLICATIONS: AI and the Conduct of War Limits to the Automation of War Automated Defense Systems

COMPUTING IN A DEMOCRATIC SOCIETY: Community Access Computerized Voting Civil Liberties Risks of the New Technology Computing and the Future of Work

COMPUTERS IN THE PUBLIC INTEREST Computing for the Handicapped Resource Modeling Arbitration and Conflict Resolution Software and the Professions Software Safety

Submissions will be read by members of the program committee, with the assistance of outside referees. The program committee includes Steve Berlin (MIT), Jonathan Jacky (U. WA), Richard Ladner (U. WA), Bev Littlewood (City U., London) Nancy Leveson (UCI), Peter Neumann (SRI), Luca Simoncini (U. Reggio Calabria, Italy), Lucy Suchman (Xerox PARC), Terry Winograd (Stanford), and Elaine Weyuker (NYU).

Complete papers, not exceeding 6000 words, should include an abstract, and a heading indicating to which topic it relates. Reports on in-progress or suggested directions for future work will be given equal consideration with completed work. Submissions will be judged on clarity, insight, significance, and originality. Papers (4 copies) are due by April 1, 1988. Notices of acceptance or rejection will be mailed by June 1, 1988. Camera-ready copy is due by July 1, 1988. Send papers to Professor Nancy Leveson, ICS Department, University of California Irvine, Irvine, CA 92717.

Proceedings will be distributed at the symposium, and will be available during the 1988 AAAI conference. The DIAC-87 proceedings are being published by Ablex. Publishing the DIAC-88 proceedings is planned. The program committee will select a set of papers to be considered for publication in a special section of the Communications of the ACM. For further information contact Nancy Leveson (714-856-5517) or Doug Schuler (206-865-3226). Sponsored by Computer Professionals for Social Responsibility P.O. Box 717 Palo Alto, CA 94301


HUNTING AND GATHERING TALES

HOW TO DEBUG A FIELD COMPUTER

Napoleon Chagnon and Raymond Hames have been using portable computers in an Amazonian tropical forest environment for the last three years, and they have accumulated much valuable information on how to avoid problems. The environment is a difficult one in which to work, and often leads to hardware bugs. They suggest a very ingenious method of dealing with these bugs, but it requires equipment that can only be obtained in industrialized countries. In particular they have found that duct tape and peanut butter, two products found only in the most advanced industrialized cultures, can be utilized advantageously in dealing with hardware bugs. You place a ten centimeter strip of duct tape on the top of the computer with the sticky side up and one of the ends pointing toward the nearest open floppy drive slot. Then you place a spot of peanut butter in center of the tape. Leave the computer in a quiet place over night, and the bugs will come out to eat the peanut butter and get stuck on the tape. Bugs and tape can be disposed of in the morning. Probably the technique can be improved by putting the computer under its own mosquito netting in the evening so that none of the other wandering cockroaches that happen to come by can interfere with the debugging process. Seriously, this is a real problem in the tropical forests, and, for that reason, we recommend portable computers with disk drives that close.


EDITORIAL POLICY

James Dow, CAAN Editor

I have adopted the policy of publishing CAAN whenever enough good manuscripts have accumulated to make a mailing economical. As computer use in anthropology is increasing, I expect to publish it more regularly in the future. For this reason, the next issue, Vol. 3 No. 1, will be coming out soon. We are seeking articles for issues beyond that. News and announcements will be published as quickly as possible. Back issues, whose contents are listed at the end of this issue, have also been reprinted and are available. Most subscriptions end with this issue, so we have included invoices for subscribers who have not yet paid for Volume 3. Thank you very much for your support. CONTENTS OF PAST ISSUES Vol. 1, No. 1: Managing field notes; Apples and archaeology; 3-D archaeology program; Data for community workers; Demographic programs; Using Superfile; Management of ecological data in the field. Vol. 1, No. 2: Setting up and operating a computer under adverse field conditions; Courseware; User supported software; Apples and archaeology; Hardware; Center for Computer Applications in the Humanities; Network models and database management; Teaching with computers; Visual bracket plotting with SAS/ETS. Vol. 1, No. 3: A computer approach to a precolonial language problem; The anthropology of computers; World Cultures Database; Electronic networking update; Hardware and software; The world of kinship; Bibliographies; Computers for other cultures; Statistical entailment analysis; Package for data analysis and matrix manipulation; Vol. 1, No. 4: Archaeological cataloging systems for personal computers; Socnet and Polinet; Freeware vs. moneyware; Computers in human services; Fido; Center for archaeology field training; Rugged field computer. Vol. 2, No. 1: A review of selected microcomputer statistical software packages. Vol. 2, No. 2: Using a computer in the field; Database management systems for variable length texts; MINARK database; Statistical packages for microcomputers; Cultural anthropology database; Electronic mail; Glossary of electronic communication terms; NAPA bulletin board; The Anthropologist's Bulletin Board System. Vol.2, No. 3: Statistical programs for the Macintosh; Roundtable symposium on microcomputers in anthropology; Computers in Southeast Asia; COHORTS; WORDTREE: Needle in a Haystack; Sort Blocks by Fields; DIGSITE: Computer simulation of an archaeological excavation.