Typology, Machine Learning, And The Study Of Archaeological Artifacts

Published by Norman MacLeod

The Natural History Museum, Cromwell Road, London, SW7 5BD, UK

These findings are described in the article entitled The quantitative assessment of archaeological artifact groups: Beyond geometric morphometrics, recently published in the journal Quaternary Science Reviews. This work was conducted by Norman MacLeod from The Natural History Museum, London, University College London, and the Nanjing Institute of Palaeontology and Stratigraphy, Chinese Academy of Sciences.

Archaeology is usually defined as the study of human cultural development through the analysis of materials that have been produced by human cultures in both time and space.

While not being restricted to the consideration of artifacts of any particular age, archaeologists have focused their attention largely on the study of prehistorical materials, which is to say materials produced before the advent of written records, roughly from the first appearance of human-made artifacts such as stone tools c. 3.3 million years before present (ybp); through evidence of habitations, c. 300,000 ybp; landscape modifications, especially via animal extinctions, c. 50,000 ybp; evidence of animal domestication, c. 11,000 ybp; pictographs, c. 77,000 ybp, to the appearance of writing such as cuneiform tablets, c. 3,700 BC, and Chinese bone scripts, c. 3,000 ybp.

While the literature devoted to historically-recorded time dwarves that devoted to prehistory, the pre-historical period encompasses over 99 percent of human history. Almost all our understanding of what humans were like, how they lived, where they lived, and what they accomplished during the overwhelming majority of human history comes from their artifacts ( see Fig. 1) along with data from comparatively small, but no less important, collections of their skeletal remains. Since the analysis of these materials has, for the most part, been undertaken in a comparative context which focuses on the documentation of patterns of similarity and difference, an alternative definition of archaeology would be the study of patterns in the products of human activity, especially if human bodies are considered part of this category. But how do archaeologists assess and study these patterns and what effect do the decisions researchers make with regard to defining what constitutes a pattern have on the results produced?

A primary goal of archeological research is to know what was in the minds of the creators of these artifacts. What cultural traditions were they working under? Where did those traditions come from? What constraints were imposed on these artisans by their societies, their cultural histories, and their materials? How — and why — did these factors change over time? Where and when did ancient artisans transcend local traditions or constraints? How did this transcendence come about? And, ultimately, what effect did the innate capacity for intellectual and technological innovation have in enabling ancient human cultures to meet the many challenges they faced? But the only evidence available for analysis are the artifacts themselves, which are often frustratingly simple in their design — especially when all that is preserved is a fragment — yet, at the same time, intriguingly complex in terms of their material composition, techniques of fabrication, purpose, cultural status, and symbolic value.

The structure of similarities and differences within and between archaeological artifacts allows them to be placed into classes or categories that not only exhibit an internal coherence but can be used to place particular sorts of artifacts within particular intervals of human history and to particular geographic regions which constituted the site of their fabrication, use or trade. Individual characteristic examples of these categories are referred to as “types” which possess or represent the “essence” of the category to which they belong. Types are useful in establishing a physical standard against which materials of unknown provenance can be compared in order to determine whether they belong to a previously established conceptual category or constitute a new category.

As far back as the 1500s, antiquaries (the forerunners of modern archaeological researchers) used obvious differences in the size and shape of the bricks, or the poses of decorative effigies, used in the construction of buildings, bridges and monuments, to distinguish truly ancient structures from those of a more recent vintage. By the 1600s regularities in the style and decorative form of Gothic buildings had been combined into an evolutionary sequence through which Medieval buildings could be dated. These determinations were based on qualitative assessments of design style and the essential aspects of these stylistic types were often organized into, discussed and understood as descriptive lists of characteristic features.

Later, archeologists were swept up in the biometric revolution of the mid-twentieth century and, especially in the post-WWII decades, began collecting quantitative data in the form of linear 2D and 3D distance measurements as a way to describe patterns in the variation of archeological artifacts (Fig. 2). These quantitative data were often combined with semi-quantitative numerically coded data (e.g., 0 = absent, 1 = present) in order to transform artifact descriptions into datasets that could be analyzed using computerized data-analysis procedures. This allowed patterns of similarity and difference between artifact groups to be represented visually using a variety of graphic techniques (e.g., scatterplots, histograms, pie charts, dendrograms). Once described in a (semi)quantitative matter, apparent differences between conceptual archeological artifact categories could be tested for significance against the hypothesis of no difference using formal statistical procedures.

In a general sense, the goal of artifact analysis is to use the objects or structures produced by ancient artisans to reconstruct the “mental templates” under which they worked and, in so doing, gain access to the conceptual landscapes prehistoric people used to operate in and understand, their world. Since each object or structure produced by these artisans represents an imperfect realization of their mental template, the template can only be reconstructed by assembling studying a collection of artifacts and can only be valid for the templates (or sets thereof) represented in the collection. Nevertheless, the existence of the category or type was often interpreted as providing direct evidence that mental templates did exist among ancient artisans whose structure coincided with the boundaries between artifact categories. In some favorable circumstances, the conceptual development of these mental constructs might even be able to be reconstructed by tracing how the structure of artifact categories — and so the mental templates of their creators — changed over time.

However, the implicit equation of artifact category types with the “mental templates” of ancient craftsmen has long been controversial among archeologists. Since there are, essentially, an unlimited number ways to subdivide any set of objects into groups, there are an unlimited number of type-based artifact categorizations possible and no certain method that can be used to determine which of these possible categorizations the artisan may have had in mind when the artifact was made. In this sense, the specification of a category type from preserved material tells us as much (likely more) about the “mental templates” of the archaeologists engaged in artifact investigations than about the artisans who are nominally these investigations’ ultimate subjects.

Figure 2.
Two different ways of representing the form of a prehistorical artifact. Left: aspects of the morphology in lateral view are represented by a set of five, standard, straight-line distances. Right: variation of all visible aspects of the same artifact as represented by a matrix of picture element (pixel) color values. In this case, the pixel matrix contained 635 rows and 305 columns of pixels with each pixel being represented by red, green and blue (RGB) color values. Note how much more information is provided by the pixel matrix representation, as opposed to the linear distance representation, of the artifact’s form. Machine learning procedures can analyze both sorts of representations, but the use of full pixel-matrix representations is relatively new in archaeological research. Image courtesy Norman MacLeod.

In the absence of guidance from some other source of information, arbitrary decisions are usually taken regarding which aspects of an artifact need to be described, coded or measured in order to summarize its critical aspects. These decisions are often made at the outset of analysis before the investigator has developed any detailed understanding of the aspects that exhibit the most variation in a sample or that covary with other aspects of the cultural system under investigation. Owing to a contemporary preference for data that can be quantified, and so subjected to numerical analysis, standard measurement systems have been developed for many artifact groups. But the existence of a commonly-used set of standard observations or measurements does not ensure they are appropriate for any particular investigation. Moreover, many of these observation/measurement systems were developed originally when the technology for collecting quantitative data from artifacts was less flexible and sophisticated than it is today.

As serious as these practical issues are, the notion that an artisans’ mental template forms the core of an essentially one-step process in which a relatively simple set of rules guides artifact production has also been challenged. In most cases, artifacts are produced by multi-step procedures with traditions, technologies, and cultural factors influencing the result at each step. It is doubtful that simple, linear mathematical procedures based on small, fixed numbers of observations or measurements can recover the intricacies of such a complex system of influences, even for artifacts that appear to exhibit simple morphologies.

In addition, there are problems associated with the sample of artifacts to which archeologists have access. Ideally, collections of artifacts for scientific investigation should be uniformly well-preserved and assembled via random selection from a large population whose scope encompasses the entire temporal interval and/or spatial region of interest, with each member of the population having an equal chance of being selected for inclusion in the sample. Needless the say, very few collections of archaeological artifacts meet such stringent criteria.

How can these problems be addressed? One new approach that is becoming popular in a wide variety of non-scientific fields is to let the artifacts themselves tell the investigator whether regularities in their composition and/or design exist such that category or type-based groupings are possible via subjecting them to an analysis by machine-learning algorithms. Computer systems designed to find, learn and utilize patterns in a wide variety of data-types without being specifically programmed to do so have been available since the 1950s, but have only reached a level of sophistication and accuracy that meets or surpasses human expert performance levels in the last decade. Because these algorithms are based on advanced mathematical concepts and implemented using high-level computer programming techniques, they are only now beginning to be incorporated into the research programs of investigators whose areas of expertise lie outside the realm of computer sciences. But a few impressive examples of machine learning-based approaches to archeological artifact analysis have already appeared, and more are on the way.

Perhaps the first of these examples was published by Brendan Nash and Elton Prewitt in 2016 in the journal Lithic Technology. This demonstration study focused on geographic patterning in the design of projectile points based on Prewitt’s typological system in which the artifacts in question were represented by a set of nominal descriptors (e.g., barbs: absent, weak, abrupt, short, flared) and measured lengths (e.g., max. width, max. length, max. thickness).

These data were submitted to an artificial neural network, which was used to determine (1) whether it could find the same patterns in a test set of Prewitt data that traditional approaches to the analysis of this type of data had recovered, (2) whether the relative importance of individual variables to overall type discrimination could be estimated, (3) whether the automated system could successfully identify projectile points considered transitional between type-based categories, and (4) whether the system could allocate projectile points not used to train the automated system to the correct group. In each trial the artificial neural network was found to produce results identical — or very close to — traditional methods of data analysis. These results were all the more remarkable as the number of artifact type categories was fairly large (15) and the number of training-set specimens remarkably small (26-30).

Figure 3.
Three different modes of shape variation in the collection of Clovis-style artifacts ordered from left to right in terms of their contribution to the overall pattern of shape variation exhibited by these artifacts across North America along. These modes form part of a series which demonstrate that machine learning methods can be used to test hypotheses involving the loci and nature of artifact variation in space and time. Image courtesy Norman MacLeod, published with permission from NHM.

In a more recent analysis published in the journal Quaternary Science Reviews, I compared the results of three different datasets — including the direct analysis of digital images (Fig. 2) — and three different data-analysis procedures — including the use of a Naïve Bayes machine-learning classifier — for their comparative abilities to identify patterns of regional shape difference in a large set of fluted, Clovis-style projectile points. In this test, the statistically least convincing results were delivered by the (current standard) of collecting coordinate-point data from the artifact outlines and subjecting these data to linear discriminant analysis. Statistically, the most significant results were provided by the machine-learning algorithm operating on data derived from generic digital images of the Clovis point artifacts (Fig. 3).

Machine-learning approaches to archaeological artifact analysis are unlikely to provide a panacea for all problems associated with the interpretation of archeological artifacts.

As noted by Dwight Read over 40 years ago …

“Our ability to discover and infer properties of a bygone culture using archaeological data is proportional to our ability to classify data in a fashion such that the patterning derived from the categories that have been distinguished has this property of paralleling the patterning in the behavior that gave rise to these data” (Read, 1974, p. 221).

The appropriateness of present archeological samples to form the basis for certain types of inferences also remains problematic in many cases, as do uncertainties regarding whether artifact categories should be defined “top-down” (which groups artifacts based on their conformance to the characteristics of an explicitly defined class) or “bottom-up” (which infers the characteristics of an implicitly defined class based on patterns of similarity that exist within a collection of artifacts). However, machine-learning procedures represent a powerful new addition to the artifact archaeologist’s toolkit of techniques that can be applied to very generalized data for testing hypotheses concerning whether differences between artifact groups actually exist, identifying the location and character of such differences and — either through direct experimentation or simulation — inferring the range of physical, conceptual, and/or cultural influences that might have been responsible their creation and development.

The puzzle of human history remains far from being solved. But archeologists have been given access to an extensive set of very powerful new tools that, when used correctly, will greatly assist their efforts to link the pieces of prehistoric archaeological evidence together so they may better interpret the picture that emerges.


  1. MacLeod, N., 2018, The quantitative assessment of archaeological artifact groups: Beyond geometric morphometrics: Quaternary Science Reviews, v. 201, p. 319–348, doi:10.1016/J.QUASCIREV. 2018.08.024.
  2. Nash, Brendan, S., and Prewitt, Elton, R., 2016, The use of artificial neural networks in projectile point typology: Lithic Technology, v. 41, p. 194–211, doi:10.1080/01977261.2016.1184876.
  3. Read, D.W., 1974, Some comments on typologies in archaeology and an outline of a methodology: American Antiquity, v. 39, p. 216–242, doi:10.2307/279584.