Who Were the Proto-Indo-Europeans?

Language, Archaeology, and the Deep Past

At a deep structural level, languages as diverse as English, Farsi, Russian, Hindi, Spanish, and Welsh - among many others - share common sounds, words, and characteristics. They have these deep similarities because they’re related to one another: All of them belong to the Indo-European language family, members of which are spoken all over the world, from Australia to Pakistan to South Africa to Canada.

We know that the Romance languages are all descendants of Latin. That’s straightforward, because practically every stage of their evolution from Latin into their present-day forms shows up in written texts. There are a few gaps - it’s hard to see precisely what happened in the early Middle Ages, before the very first appearances of Early Romance - but it’s not too difficult to reconstruct what happened in the interim. The same process works for the modern varieties of English, going backward to the time of Shakespeare, then Chaucer, and finally Beowulf.

But what if we don’t have writing to cover the periods we want to know about? What then?

That’s where the comparative method shines. English, Dutch, and German all sound fairly similar; the word for cheese in Dutch is kaas, and it’s Käse in German, for example. Especially between the last two, that’s not too far off, right? When we compare these three languages systematically, we can be certain that they’re all fairly closely related.

That’s not surprising: They’re effectively adjacent languages, with all of their ancestral territories clustered around the rim of the North Sea. Linguists affirm this: They belong to a single branch of the Germanic language family, the western branch. So do Swedish, Danish, and Norwegian, which constitute the Northern Germanic languages. When we compare their earliest written forms to each other - Old Norse and Old English, for example - their similarities become even more apparent. They all descend from a common tongue, even if we don’t have written evidence of it, and through rigorous comparisons we can reconstruct the common ancestor’s vocabulary and grammatical structures. We call that reconstructed ancestor a proto language.

When we follow the written evidence and proto languages backward in time, we eventually reach the common ancestor of all of them: Slavic and Germanic, Celtic and Italic, Greek and Armenian, Indo-Iranian and Albanian, back to the original language. This was Proto-Indo-European.

Our knowledge of that never-written language is decidedly incomplete, for a number of reasons: not every descendant tongue (or even most of them), including entire branches, has survived into the written record for us to make the comparisons; our portrait of Proto-Indo-European covers a vast amount of variation within the language itself, hundreds or even thousands of years and great distances, rather than a snapshot of a language as any individual person spoke it; so our understanding of Proto-Indo-European’s regional dialects and changes over time is necessarily limited.

But Proto-Indo-European isn’t just a construct, a theoretical product of systematic comparison between present-day and historical languages; it was a real language, even if we can never grasp all of it. If it was a real language, then it must have had real speakers, people who lived in a distinct time and place.

Can we ever know who they were?

The question has a long and not entirely pleasant history. Some racial theorists of the 20th century argued that the Proto-Indo-Europeans were an Aryan master race, the original Caucasians, who stood at the top of a global racial hierarchy and were responsible for the highest forms of civilization. The Nazis were particularly fond of that idea, with tragic consequences.

This is pretty obviously nonsense; the entire notion of a racial hierarchy is poisonously absurd by today’s standards, and the Proto-Indo-Europeans weren’t the progenitors of “civilization,” whatever that’s supposed to mean. They were simply a group of people who lived at a specific time in the past, and whose language produced a great many descendants.

Scholars today have two main theories about when and where Proto-Indo-European was spoken. The first is the Anatolian hypothesis. According to this line of reasoning, Proto-Indo-European or a form ancestral to it was the language of the first farmers who spread from the Fertile Crescent into Anatolia and then Europe, and then afterward into the Eurasian steppe, the Iranian Plateau, and India. This idea makes sense for three main reasons. First, large language families often do seem to be associated with early farming, because early farming tended to produce population explosions and migrations. Second, the earliest-attested and most archaic Indo-European languages - Hittite the best-known of them - were spoken in Anatolia. Linguists usually expect to find the most archaic and divergent languages within a family around its origin. Finally, statistical methods of comparison (glottochronology) produce languages for Proto-Indo-European that line up with the early expansion of farming, around or before 6000 BC.

But the Anatolian hypothesis, though it has its supporters, is at this point a minority position. Most of those working on the topic point to a region north of Anatolia, and a couple of thousand years later. For whatever my opinion is worth, this is the one I favor.

There are a lot of reasons for this preference, ranging from the concepts and technologies embodied in the language itself to recently published genetic data, and the weight of all that evidence points strongly in this direction: toward the Pontic-Caspian steppe, the western fringe of the great Eurasian grassland between the Ural and Carpathian Mountains north of the Black Sea, between 4500 and 2500 BC.

Why there, and why then? Let’s start with the language itself, working from the assumption that the words in a language bear some relationship to the world in which it was spoken. I’ve spent a lot of time living in deserts over the course of my life, so I have a lot of specific words in my vocabulary related to different kinds of scrub, cactus, rock formations, and the like. When people say the Inuit have many different words for snow and ice, this is the root concept at play: Language reflects, to some extent, the lived reality of its speakers.

When we apply this to Proto-Indo-European, we see words for wheels, riding in wagons, and wool. That technological complex - wheeled vehicles and keeping sheep for their wool, rather than meat - didn’t appear until around 4000 BC, much too late for the Anatolian hypothesis. This gives us earliest possible date for Proto-Indo-European.

But we can go further than this. Taken in sum, the language suggests a grassy, open homeland with big skies, populated by mobile herders of sheep and cattle rather than sedentary agriculturalists. Beyond that, the vocabulary and conceptual language contained within Proto-Indo-European points to a deeply hierarchical and patriarchal society bound together by oaths of submission to a superior, rooted in systematized inequality. This was a society of warriors, mounted on horseback, who accumulated wealth from stock-breeding, rustling, and warfare. They cared deeply about glory obtained in war, and sang praise poems at their elaborate funerals.

We can even see some hints of their deities: a sky god called Dyḗws Ph₂tḗr (the weird-looking h is a consonant called a laryngeal, since lost over the millennia), his consort the earth mother Dʰéǵʰōm, their children the Divine Twins, and a few others that left traces in the descendant languages. Some traces of their shared mythology likewise survive, including a creation myth involving a pair of brothers, one of whom dies (like Romulus and Remus) and a watchdog guarding the underworld, which has to be reached by crossing a river.

All of this fits the Bronze-Age society of the western steppes, which buried its exalted leaders with great riches in earthen tombs called kurgans (like the villain in the first Highlander movie). Gods of the all-encompassing sky and forbidding rivers make sense if you live on a vast grassland.

Around the middle of the fourth millennium BC, a new archaeological culture appeared on the steppes, a cluster of traits and ways of life focused around mobile, wagon-dwelling stock-raising and kurgan burials. This is known as the Yamnaya Horizon, and the people who lived in the wagons and built the kurgans between roughly 3300 and 2600 BC are the likeliest candidates for the speakers of mature Proto-Indo-European.

(I say “mature,” because Proto-Indo-European didn’t show up one day out of the blue for the people of the Yamnaya Horizon; instead, it evolved from an earlier and more archaic language, probably one with closely related tongues that haven’t survived. The Anatolian languages, the first Indo-European languages to appear in writing around 1500 BC, probably descended from an archaic variety of Proto-Indo-European rather than this later, classic version.)

So we have an archaeological culture that seems to match the traits contained within the reconstructed proto language. Recent genetic evidence, extracted from ancient remains, supports this theory as well. Without getting into the complex and contentious details, ancestry (especially in the male line, but also throughout the genome) associated with Yamnaya burials also shows up later, with migrant communities of mobile herders who migrated out of the steppe and into Europe, central Asia, and beyond. These groups of migrants were presumably the speakers of the ancestral languages that later developed into the various branches of the Indo-European family.

If you’d like to learn more about the Proto-Indo-Europeans, check out this week’s episode of Tides of History.


The best book about this topic, one on which I’ve drawn heavily here, is David Anthony, The Horse, the Wheel, and Language, which I highly recommend.