A Phonetic Alphabet in the Abyss: What Sperm Whales Can Teach Us About the Origins of Language
- Bryan White

- Apr 20
- 20 min read

Introduction
The evolutionary trajectories of terrestrial primates and marine cetaceans diverged more than ninety million years ago, driven by vastly different ecological pressures and environmental mediums.1 Despite this deep temporal and physiological separation, modern bioacoustic research is uncovering extraordinary structural convergences between human speech and the vocal communication systems of sperm whales (Physeter macrocephalus).2 Until the 1950s, the scientific community lacked the technological capability to even confirm that sperm whales vocalized.1 Today, propelled by the integration of massive acoustic datasets, advanced artificial intelligence, and non-invasive bio-logging robotics, researchers are delineating a sophisticated, combinatorial communication network that exhibits properties long assumed to be the exclusive domain of human language.4
In April 2026, a landmark study published in the journal Proceedings of the Royal Society B, led by linguists and marine biologists from the University of California, Berkeley, and the Cetacean Translation Initiative (Project CETI), fundamentally reshaped the comparative analysis of non-human cognitive systems.1 The research provided empirical evidence that sperm whales manipulate the timing and spectral frequencies of their clicks to form a structured phonetic alphabet.1 Furthermore, these vocalizations exhibit phonological rules, durational contrasts, and spectral properties that closely mirror specific structural mechanics found in human languages such as Mandarin, Latin, and Slovenian.1
This comprehensive report synthesizes the current state of sperm whale bioacoustics. It provides an advanced analysis of the anatomical mechanisms responsible for deep-sea sound production, details the specific combinatorial variables that constitute the cetacean phonetic alphabet, and evaluates the spectral properties analogous to human vowels. Crucially, the report contextualizes these findings within ongoing methodological debates regarding acoustic signal processing, the application of artificial intelligence in bioacoustics, and the profound legal and behavioral implications of discovering a parallel linguistic structure in the marine environment.
The Anatomical Engine and Acoustic Physics of the Deep
To rigorously evaluate claims of cetacean phonology, it is necessary to first understand the unique physiological apparatus that sperm whales use to generate acoustic signals. Unlike humans, who rely on a larynx and vocal folds situated within a highly flexible vocal tract, the sperm whale produces sound via a highly specialized and massive nasal complex.6 The head of a mature male sperm whale can account for up to one-third of its total body length and houses an acoustic generation system adapted for the extreme hydrostatic pressures of the deep ocean.8
The primary unit of sperm whale vocalization is the "click," a broadband, transient acoustic impulse.6 The production of this impulse is localized near the blowhole at a structure known as the phonic lips, or monkey lips.10 According to the traditional understanding of cetacean acoustics, air is forced through these pneumatic lips, causing them to snap shut and generate an initial acoustic transient.10 The resulting sound energy is then propagated backward through the spermaceti organ—an immense cavity containing up to two thousand liters of wax-like spermaceti oil.9 Upon reaching the frontal sac at the skull, the sound reflects forward, traveling through the "junk" compartment, which consists of wafer-like connective tissue lenses and fat bodies, before finally being emitted into the surrounding water.9
This anatomical configuration operates as a highly efficient, directional biological sonar, enabling the whales to emit some of the loudest sounds recorded from any biological source, primarily utilized for echolocation during deep-sea foraging for giant squid.4 However, beyond echolocation, sperm whales utilize these same clicks for complex social communication.5 During social interactions, whales emit rapid, rhythmic sequences of clicks termed "codas".4
For decades, acoustic models such as the bent-horn theory posited that the structural variation in these clicks was largely a passive reflection of the animal's physical size and internal anatomy.9 However, the advent of continuous acoustic monitoring has demonstrated that these codas are not rigid biological artifacts, but rather actively controlled sequences that constitute the foundation of a highly structured communication system.11
The Combinatorial Phonetic Alphabet
Historically, marine biologists classified sperm whale codas using a rudimentary taxonomy based strictly on the number of clicks and the temporal spacing between them, known as the inter-click interval.12 A coda might be categorized as a "5R," denoting five clicks separated by regular, evenly spaced intervals, or an "8D," denoting eight clicks with decreasing intervals.12 These patterns were frequently compared to Morse code, serving primarily as identifiers for specific individuals or social clans.2
However, this linear classification vastly underestimated the communicative bandwidth of the species.14 In 2024, researchers affiliated with the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory and Project CETI analyzed an extensive dataset of 8,719 codas recorded from the Eastern Caribbean sperm whale clan.5 Applying sophisticated pattern recognition and classification algorithms to this dataset, the research team demonstrated that coda production exhibits profound contextual and combinatorial structure.5
The researchers identified four distinct, independent variables that the whales dynamically manipulate, collectively forming a "sperm whale phonetic alphabet".4
Phonetic Component | Definition in Sperm Whale Acoustics | Role in Communication Structure |
Rhythm | The normalized sequence and proportion of inter-click intervals within a coda. | Functions as the base structural framing or "root" of the vocalization. |
Tempo | The overall absolute duration of the coda; the speed of rhythmic execution. | Provides a context-independent variation applied over the rhythm. |
Rubato | The smooth, dynamic variation of coda duration based on conversational context. | Enables continuous pacing adjustments, allowing whales to synchronize or alter flow in real-time. |
Ornamentation | The systematic addition of extra clicks (often acting as a suffix) to an existing rhythm. | Modifies the baseline coda, adding context-sensitive layers to the transmission. |
The discovery of this combinatorial structure represents a watershed moment in comparative cognition. The analysis revealed that whales do not simply broadcast from a closed repertoire of fixed calls; instead, they freely combine rhythm, tempo, rubato, and ornamentation to generate an expansive inventory of distinguishable signals.15 The researchers successfully cataloged 156 distinct combinations resulting from these variables alone.5
Furthermore, the utilization of rubato and ornamentation demonstrates intense conversational awareness.17 Acoustic data indicates that whales make sub-second adjustments to their tempo to match a conversational partner during an exchange.19 They also append ornamental clicks systematically depending on the specific social context of the interaction.19 This flexibility provides compelling evidence for the linguistic concept of "duality of patterning"—a property where intrinsically meaningless elements are systematically combined to construct larger, meaningful units, a feature previously hypothesized to be unique to human language.5
Spectral Properties and Deep-Sea Vowels
While the 2024 findings elucidated the temporal sequencing of the phonetic alphabet, subsequent research pivoted to analyzing the internal acoustic texture of the individual clicks. In 2025 and 2026, Gašper Beguš, the linguistics lead for Project CETI, and his colleagues published data indicating that sperm whales actively modulate the spectral frequencies of their clicks.14 This discovery introduced a new, previously unappreciated layer of complexity orthogonal to the timing-based alphabet.14
By examining the fast Fourier transform spectrum of high-resolution acoustic recordings, the researchers identified distinct, structured spectral peaks.22 In the acoustic analysis of human speech, such resonant frequency peaks are known as formants, and the active manipulation of these formants allows humans to articulate different vowel sounds.6 Beguš et al. proposed that sperm whale codas could be effectively conceptualized using the same "source-filter theory" applied to human speech production.7
Under this theoretical framework, the sperm whale's phonic lips function as the acoustic source (analogous to the vibration of human vocal folds), dictating the pitch and overall duration based on the number and timing of the clicks.7 The distal air sac and the surrounding cranial anatomy function as the acoustic filter (analogous to the human vocal tract), selectively amplifying specific frequencies to shape the spectral properties of the emission.7
Acoustic Parameter | Source-Filter Role | Human Speech Analogue | Sperm Whale Analogue |
Excitation | Source | Vocal fold vibration | Phonic lip pneumatic closure |
Duration | Source | Vowel temporal length | Absolute number of clicks |
Pitch Contour | Source | Fundamental frequency manipulation | Inter-click interval spacing |
Resonant Filter | Filter | Vocal tract shape and volume | Distal air sac and nasal complex |
Spectral Output | Filter | Vowel quality (formant peaks) | Coda vowel type (a-coda vs i-coda) |
The analysis identified two discrete, recurrent spectral categories that appear consistently across individual whales and traditional coda types, designated as "a-codas" and "i-codas".12 The classification is binary and depends on the number of spectral formants present within a specific frequency range: a-codas are characterized by a single prominent spectral peak, whereas i-codas exhibit two distinct spectral peaks.12 The distribution of these qualities is highly organized; an individual coda is typically homogenous, with almost no mixing of a-quality and i-quality clicks within the same sequence.12
Moreover, the research team identified dynamic formant trajectories in certain individual codas, noting the presence of rising, falling, rising-falling, and falling-rising spectral patterns.22 These dynamic shifts are acoustically analogous to human diphthongs, where a speaker glides from one vowel sound to another within a single syllable.21 The presence of these vowels and diphthongs, actively utilized in conversational dialogues, suggests a level of deliberate articulatory control that drastically elevates the estimated sophistication of cetacean communication.13
Phonological Convergence with Human Languages
The 2026 study published in Proceedings B extended the analysis beyond acoustic similarities, explicitly mapping the behavioral patterning of sperm whale codas to the phonological rules governing specific human languages.2 The researchers identified five specific dimensions along which sperm whale vocalizations structurally converge with human phonetics.26
First, there is a distinct statistical correlation between the spectral vowel quality and the timing-based coda type.12 In standard Slovenian, the vowel 'ə' surfaces preferentially with a high tonal pitch, while the vowel 'ε' is preferred with a low tone.12 Similarly, sperm whales exhibit non-random preferences for pairing specific vowel qualities (a or i) with specific rhythm types (e.g., 5R or 7D).12
Second, the analysis revealed intrinsic durational differences between vowel types.12 Across human languages, low or open vowels, such as the "a" sound, inherently require a wider jaw opening and therefore possess a physically longer acoustic duration than high, closed vowels like the "i" sound.12 Fitting the acoustic data to a mixed-effects linear regression model, the researchers found that sperm whale a-codas are generally significantly longer in duration than i-codas.12 This mirrors a fundamental biomechanical constraint observed in human phonetics, despite the whales utilizing an entirely different anatomical apparatus.12
Third, the whales exhibit contrastive length within their vowel categories.12 In human languages such as Arabic, Latin, and Hungarian, the duration of a vowel can alter the semantic meaning of a word entirely—for example, the Hungarian word "bor" means wine, while "bór" means boron.12 The durational measurements of sperm whale i-codas demonstrate a strict bimodal distribution, revealing a distinct structural contrast between short i-codas and long ī-codas.12 This distribution strongly suggests that vowel length carries semantic or contextual weight in their exchanges.12
Fourth, the manipulation of inter-click intervals functions analogously to human tonal languages, such as Mandarin.1 In Mandarin, altering a word's pitch contour from rising to falling changes its definition.1 By altering the temporal intervals between clicks, whales generate rising or falling contours.1 A coda with decreasing intervals (such as an 8D) functions similarly to a rising tone, whereas a coda with increasing intervals (such as an 8i) mimics a falling tone.12 The ability to systematically differentiate meanings through rising or falling temporal tones underscores the density of their communication.2
Finally, the acoustic data exhibits a phenomenon resembling human coarticulation.12 In fluid human speech, the articulation of a specific sound is mechanically influenced by the sounds immediately preceding or following it. The researchers observed that the "edge clicks" located at the absolute beginning or end of a coda sometimes mismatch the intrinsic vowel quality of their parent coda, but instead perfectly match the vowel quality of the adjacent coda in the sequence.12 This indicates that sperm whale codas are not generated as isolated, discrete packets of sound, but are fluidly integrated into a continuous, anticipated stream of communicative intent.12
Methodological Controversies: The Transient Fallacy
The publication of vowel-like spectral properties in non-human vocalizations triggered intense methodological scrutiny within the broader bioacoustics community.27 Applying linguistic and phonetic frameworks to animal communication carries the inherent risk of scientific anthropomorphism—the unwarranted projection of human physical constraints onto alien biologies. In December 2025, acoustician Fatima C. Spisländer published a severe methodological critique of the Beguš et al. findings, titled "The Transient Fallacy: A Methodological Critique of 'Vowel-and Diphthong-Like Spectral Patterns in Sperm Whale Codas'".6
The core of the debate centers on the physics of signal processing. Spisländer argues that the original study committed a fundamental category error by applying steady-state analytical tools to transient acoustic impulses.6 Human vowels are quasi-periodic, steady-state signals that typically last between 50 and 200 milliseconds, allowing sufficient time for resonance to fully develop within the vocal tract.6 In stark contrast, a sperm whale click is an explosive transient impulse lasting approximately 100 microseconds—up to two thousand times shorter than a human vowel.6
The original researchers utilized Linear Predictive Coding to identify the spectral peaks they defined as formants.6 Spisländer contends that applying Linear Predictive Coding to a microsecond transient is acoustically invalid. The original study utilized a 3.5-millisecond analysis window for an event lasting only 0.1 milliseconds.6 Consequently, the critique argues that 97 percent of the data captured in the analysis window was not the sound-generating articulatory event, but post-impulse reverberation and passive echoes reflecting off the internal structures of the whale's head.6 Spisländer draws a parallel to clapping hands inside a large cathedral: the echo will contain distinct spectral peaks dictated by the geometry of the building, but it is scientifically untenable to claim the room is "speaking vowels." The peaks represent a passive impulse response, not active phonology.6
Furthermore, the critique challenges the anatomical tenability of the source-filter model proposed for the sperm whale.6 To mathematically justify the specific resonant frequencies observed in the data, Beguš et al. calculated that the whale's internal filter would require an effective tubular length of roughly 2.2 centimeters.6 Spisländer dismisses this parameter as "anatomically fictitious," pointing out that the actual distal air sac of an adult sperm whale measures several decimeters across and functions structurally as a disk, not a narrow tube.6
Most critically, the critique offers a purely physical explanation for the variance in the spectral patterns, focusing on hydrostatic pressure.6 The original study noted a significant statistical correlation between the acoustic spectral peaks and the depth at which the whale was vocalizing, yielding a correlation coefficient of 0.59.6 While the original authors interpreted this correlation as evidence of whales actively changing their vowels based on the behavioral context of diving, Spisländer points to basic fluid dynamics.6 As a whale dives, hydrostatic pressure increases exponentially. Because the air sacs within the whale's head are compressible, the external pressure physically reduces their internal volume. According to the physics of acoustic resonance, reducing the volume of a cavity will inherently raise its resonant frequency.6 Thus, the critique concludes that the shift in spectral "vowels" is not an intentional linguistic articulation, but the inescapable, passive consequence of hydrostatic compression acting upon the whale's anatomy.6
Aspect of Debate | The "Vowel" Hypothesis (Beguš et al.) | The "Transient Fallacy" Critique (Spisländer) |
Nature of the Acoustic Signal | Formants exhibiting active articulatory control, structurally comparable to human vowels. | Passive impulse responses generated by transient clicks reflecting within fixed cranial structures. |
Validity of Analytical Tool | Linear Predictive Coding accurately captures and maps the spectral peaks of the coda. | Linear Predictive Coding is invalid for 100-microsecond transients; captures 97% post-impulse reverberation. |
Anatomical Modeling | The distal air sac functions acoustically as a 2.2-centimeter resonant tube filter. | The model is anatomically fictitious; the actual sac is decimeters wide and disk-shaped. |
Depth Correlation Rationale | Indicates active acoustic modulation related to the specific behavioral context of diving. | Purely passive physical mechanics: hydrostatic pressure compresses the air sac, raising the resonant frequency. |
This methodological clash highlights the profound complexities of decoding non-human communication. Decoupling the deliberate intent of a marine mammal from the severe biological and physical constraints of deep-ocean fluid dynamics remains a central challenge in modern bioacoustics.
Machine Learning Infrastructure and the WhAM Model
The sheer scale and resolution of the data required to uncover the phonetic alphabet and spectral properties would have been entirely impossible without the deployment of cutting-edge artificial intelligence.2 Processing continuous acoustic data from marine environments—characterized by overlapping vocalizations, extreme background noise, and thousands of hours of recordings—far surpasses human manual analytical capacities.5
The origin of Project CETI's computational approach stems from a chance encounter at the Radcliffe Institute for Advanced Study in 2017.31 Marine biologist David Gruber was listening to sperm whale codas in his office when Shafi Goldwasser, an MIT cryptographer, noted the structural similarity between the rhythmic clicks and encrypted Morse code.31 This observation catalyzed a massive interdisciplinary effort, recruiting experts in natural language processing, cryptography, and robotics to decode the cetacean signals.31
Data collection was revolutionized by non-invasive bio-logging tags, utilizing clingfish-inspired suction cups designed by Harvard robotics researchers to safely adhere to the whales' skin.32 These tags are equipped with three synchronized, high-bandwidth hydrophones capable of capturing ultra-high-resolution acoustic data, recording at sampling rates of 120 or 125 kilohertz with 16-bit resolution.12
To process this immense data pipeline, theoretical computer scientists developed a specialized neural network architecture called WhAM (Whale Acoustics Model).33 WhAM is a transformer-based audio-to-audio model explicitly designed to synthesize, analyze, and translate sperm whale codas.34 Built upon a masked acoustic token model framework known as VampNet (originally designed for generating and modeling musical audio), WhAM was fine-tuned using over ten thousand highly contextualized coda recordings collected over two decades.33
Operating through iterative masked token prediction, WhAM possesses several groundbreaking capabilities. Foremost is its generative capacity. WhAM can synthesize novel, high-fidelity "pseudocodas" that flawlessly preserve the temporal and spectral acoustic textures of wild sperm whales.9 In rigorous double-blind perceptual studies evaluated using the Fréchet Audio Distance metric and expert marine biologist listeners, the synthetic codas were found to be practically indistinguishable from natural biological recordings.9
Secondly, WhAM functions as an acoustic translation engine.9 Operating entirely at the signal level, the model can accept an arbitrary audio prompt—such as human speech, a finger snap, or ambient noise—and perform a cross-domain style transfer, re-encoding the input audio into the distinct acoustic texture of a sperm whale coda.9 While this does not yet achieve semantic translation (the model does not understand the meaning of the synthesized coda), it establishes the required technological foundation for bi-directional communicative interfaces.9
Finally, despite being trained primarily for generative tasks, the internal learned representations of WhAM excel in downstream classification challenges.9 The neural network can autonomously categorize rhythm types, reliably classify the newly debated a-coda and i-coda spectral patterns, and even identify the specific social units of the broadcasting whales.9 According to Project CETI's scientific roadmap, published in the journal iScience in 2022, these computational models are critical milestones toward a five-year goal of validating the higher-level language structures of the whales through interactive playback experiments in the wild.30
Behavioral Ecology, Social Complexity, and Efficiency Laws
If the acoustic analyses correctly identify a highly complex phonological system, it demands an examination of the ecological and social pressures that necessitated such an evolutionary adaptation. The "Social Complexity Hypothesis" asserts that animal species living in intricate, hierarchically organized societies require concomitantly sophisticated communication systems to mediate cooperative behaviors, coordinate movements, and transmit cultural knowledge.17
The extensive fieldwork required to contextualize these recordings relies heavily on the Dominica Sperm Whale Project (DSWP), founded in 2005 by biologist Shane Gero.39 Building upon earlier frameworks established by Hal Whitehead, Gero's continuous observation of the Eastern Caribbean populations revealed that sperm whales organize into tight-knit, matrilineal family units.38 These family units nest within larger, culturally distinct populations known as clans, which operate on ocean-basin scales.40
Crucially, these clans are demarcated not by genetic boundaries, but by shared vocal dialects.41 For instance, the Eastern Caribbean clan (EC1) utilizes a highly specific coda—the "1+1+3"—as a distinct cultural identifier.42 Conversely, a smaller, sympatric clan expresses its identity via a long, slow articulation of evenly spaced clicks, denoted as "1+1+1+1+1".42 At the individual level, specific codas such as the "5R" function as personal acoustic badges.41 These distinct dialects allow individuals to broadcast their personal, familial, and clan affiliations across miles of ocean, acting in essence as proclamations of cultural identity.42
Social Organization Level | Biological Description | Primary Acoustic Signifier |
Individual | A specific whale within a matrilineal unit. | Individual-specific rhythmic variations (e.g., specific execution of a 5R coda). |
Social Unit | Tight-knit, matrilineal family groups. | Shared 4-click coda types distinctive to the specific unit. |
Vocal Clan | Ocean-basin scale populations sharing a culture. | Standardized, dialectal identity codas (e.g., the 1+1+3 coda for the EC1 clan). |
The evolutionary timeline supporting this acoustic culture is staggering. David Gruber suggests that the ancestral lineage of these whales may have been passing complex acoustic information continuously from generation to generation for over twenty million years.1 To contextualize this timeline, anatomically modern humans have existed for roughly three hundred thousand years. The realization that an oceanic species has maintained a continuous, evolving cultural and communicative lineage for twenty million years fundamentally reorients humanity's perceived uniqueness.1
The profound depth of their communal lives was documented in unprecedented detail in 2026, with findings published in the journals Science and Scientific Reports.40 Researchers analyzing over six hours of synchronized aerial drone footage and underwater acoustic arrays documented an entire sperm whale unit participating in a collaborative birth off the coast of Dominica.40 The footage revealed adult female whales—comprising grandmothers, aunts, and even genetically unrelated females from different matrilines—acting effectively as midwives.40 These individuals engaged in coordinated physical lifting to support the laboring mother and keep the newborn calf above the surface to breathe.43
Crucially, the high-resolution acoustic data recorded during the birth revealed distinct, organized shifts in coda vocal styles during key physiological events.44 This suggests the whales were actively coordinating their physical caregiving efforts through high-bandwidth acoustic exchanges.40 Such elaborate, non-kin cooperation and communal babysitting require a communication system capable of expressing nuance, strategy, and mutual intent—traits perfectly aligned with the combinatorial phonetic alphabet decoded by the MIT and CETI models.2
This structural complexity is not entirely isolated within the animal kingdom, but it appears to represent a pinnacle of acoustic efficiency. To provide a comparative baseline, birdsong has long been studied for its encoded information. The Savannah sparrow, for example, utilizes a song where the early part identifies the individual, the middle part identifies the population dialect, and the overall theme identifies the species.45 However, birdsong remains fundamentally different from the generative, combinatorial structure of sperm whale codas, which more closely approximate the grammatical building blocks of human language.45
Further analysis across multiple species confirms this efficiency. Research assessing vocal sequences across 51 human languages and 16 different cetacean species found that whales adhere to the same underlying mathematical laws of communicative efficiency found in human speech.46 Menzerath’s law (which dictates that longer overall sequences will consistently consist of shorter individual elements) and Zipf’s law of abbreviation (where more frequently used elements are inherently shorter) are demonstrably present in cetacean vocalizations.46 In certain analytical metrics, the whale data exhibited even greater statistical effect sizes for these efficiency laws than human language datasets.46 This confirms that the sperm whale communication system is highly optimized for transferring maximum information with minimal energetic expenditure, an evolutionary necessity for a species that must balance acoustic broadcasting with the metabolic demands of deep-ocean breath-holding.46
Ecological and Legal Paradigms
The revelation that humanity shares the planet with a species possessing a parallel linguistic structure has triggered immediate philosophical, ecological, and legal ramifications. As the scientific consensus shifts toward recognizing sperm whales not merely as biological resources or endangered fauna, but as highly cognitive, communicative entities with rich cultures, the legal frameworks governing ocean management are being forced to adapt.47
In early 2026, interdisciplinary efforts spearheaded by New York University’s More-Than-Human Life (MOTH) Program and Project CETI published foundational legal frameworks exploring the impact of AI-assisted bioacoustics on nonhuman animal law.47 Co-authored by legal scholars such as César Rodríguez-Garavito alongside linguists and marine biologists, a pivotal paper in the Ecology Law Quarterly explored a straightforward but radical premise: if artificial intelligence proves that cetaceans possess a verifiable capacity for language, it permanently disrupts the centuries-old linguistic theories that confine language, and thereby legal personhood, exclusively to humans.47
This theoretical groundwork rapidly manifested in global policy. In February 2026, an unprecedented Indigenous treaty was established recognizing whales directly as legal rights-holders.44 This landmark declaration directly influenced national legislation in New Zealand and sparked broader international diplomatic efforts to translate Indigenous ecological customs into binding, actionable legal protections.44 Proving that sperm whales communicate in complex phonetic structures provides unprecedented leverage to shield their habitats from anthropogenic threats such as deep-sea mining, commercial shipping strikes, and catastrophic noise pollution.26 The eventual capability to actively monitor or translate whale communication opens the door to dynamically responsive oceanic policies, where human commercial activity could be legally required to halt or reroute based on the real-time acoustic monitoring of deep-sea cultural exchanges.37
Conclusion
The intersection of advanced robotics, artificial intelligence, marine biology, and computational linguistics has catalyzed a profound expansion in our understanding of non-human cognition. The large-scale acoustic analysis of sperm whale vocalizations reveals a communication system driven by a combinatorial phonetic alphabet consisting of rhythm, tempo, rubato, and ornamentation. Furthermore, the identification of spectral properties analogous to human vowels and diphthongs—encompassing intrinsic duration, contrastive length, and tonal manipulation—suggests that the physical mechanics of sound production and the intense social pressures of communal living drive the convergent evolution of phonology across entirely disparate biological domains.
While rigorous methodological debates, exemplified by the Transient Fallacy, correctly demand strict adherence to acoustic physics and caution against anthropomorphic interpretations of deep-sea hydrostatics, the overwhelming consensus of the behavioral and temporal data indicates a staggering level of cognitive complexity. The sperm whales of the Eastern Caribbean, and their kin across the global oceans, operate within a deeply entrenched, twenty-million-year-old acoustic culture. As transformer-based neural networks like WhAM continue to parse these ancient signals, preparing the groundwork for interactive playback experiments, science edges closer to an unprecedented frontier: the undeniable realization that the deep ocean hosts an ancient, parallel intelligence, communicating in a structure remarkably similar to our own.
Works cited
Sperm whales' communication closely parallels human language, study finds - The Guardian, accessed April 15, 2026, https://www.theguardian.com/environment/2026/apr/15/sperm-whales-alphabet-vocalizations-similar-humans
Sperm Whale Coda Analysis Reveals Structural Similarities to Human Speech, accessed April 15, 2026, https://www.asatunews.co.id/en/sperm-whale-speech-human-similarities
Sperm Whales Have a Language and Its Structure Is Remarkably Like Our Own, accessed April 15, 2026, https://www.zmescience.com/science/oceanography/sperm-whales-have-a-language-and-its-structure-is-remarkably-like-our-own/
Scientists Discover a 'Phonetic Alphabet' Used by Sperm Whales, Moving One Step Closer to Decoding Their Chatter - Smithsonian Magazine, accessed April 15, 2026, https://www.smithsonianmag.com/smart-news/scientists-discover-a-phonetic-alphabet-used-by-sperm-whales-moving-one-step-closer-to-decoding-their-chatter-180984326/
Exploring the mysterious alphabet of sperm whales | MIT News ..., accessed April 15, 2026, https://news.mit.edu/2024/csail-ceti-explores-sperm-whale-alphabet-0507
The Transient Fallacy - A Methodological Critique of 'Vowel-and Diphthong-Like Spectral Patterns in Sperm Whale Codas' - ResearchGate, accessed April 15, 2026, https://www.researchgate.net/publication/398869943_The_Transient_Fallacy_-_A_Methodological_Critique_of_'Vowel-and_Diphthong-Like_Spectral_Patterns_in_Sperm_Whale_Codas'
Vowel- and Diphthong-Like Spectral Patterns in Sperm Whale Codas - PMC, accessed April 15, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC12594577/
Podcast: What Has the Whale to Say? / with David Gruber, Project CETI - YouTube, accessed April 15, 2026, https://www.youtube.com/watch?v=NeVMg7DG1Do
WhAM: Towards A Translative Model of Sperm Whale Vocalization - arXiv, accessed April 15, 2026, https://arxiv.org/html/2512.02206v1
Sperm whale sound production studied with ultrasound time/depth-recording tags, accessed April 15, 2026, https://www.researchgate.net/publication/11298193_Sperm_whale_sound_production_studied_with_ultrasound_timedepth-recording_tags
Sperm Whale Clicks Aren't Random — They Follow Human‑Like Sound Rules, accessed April 15, 2026, https://www.discovermagazine.com/sperm-whale-clicks-aren-t-random-they-follow-human-like-sound-rules-48969
The phonology of sperm whale coda vowels | Proceedings B | The ..., accessed April 15, 2026, https://royalsocietypublishing.org/rspb/article/293/2069/20252994/481340/The-phonology-of-sperm-whale-coda-vowels
UC Berkeley and Project CETI study shows sperm whales communicate in ways similar to humans | Letters & Science, accessed April 15, 2026, https://ls.berkeley.edu/news/uc-berkeley-and-project-ceti-study-shows-sperm-whales-communicate-ways-similar-humans
Scientists Discover Vowel- and Diphthong-like Patterns in Sperm Whale Communication, accessed April 15, 2026, https://www.prnewswire.com/news-releases/scientists-discover-vowel--and-diphthong-like-patterns-in-sperm-whale-communication-302612877.html
Contextual and combinatorial structure in sperm whale vocalisations - PubMed, accessed April 15, 2026, https://pubmed.ncbi.nlm.nih.gov/38714699/
Contextual and Combinatorial Structure in Sperm Whale Vocalisations - bioRxiv, accessed April 15, 2026, https://www.biorxiv.org/content/10.1101/2023.12.06.570484v1
Contextual and combinatorial structure in sperm whale vocalisations - PMC - NIH, accessed April 15, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC11076547/
Political Possibilities of Whale Translation - Berggruen Institute, accessed April 15, 2026, https://berggruen.org/news/the-political-possibilities-of-whale-translation
Project CETI •-- Blog --• Sperm Whale Phonetic Alphabet Proposed for the First Time, accessed April 15, 2026, https://www.projectceti.org/blog-posts/sperm-whale-phonetic-alphabet-proposed-for-the-first-time
How scientists are piecing together a sperm whale 'alphabet' | National Geographic, accessed April 15, 2026, https://www.nationalgeographic.com/premium/article/sperm-whales-alphabet-communication
Sperm whales use vowels like humans, new study finds - Popular Science, accessed April 15, 2026, https://www.popsci.com/environment/sperm-whale-language-vowels/
Vowel- and Diphthong-Like Spectral Patterns in Sperm Whale Codas, accessed April 15, 2026, https://pubmed.ncbi.nlm.nih.gov/41210602/
Phorum 2024 - UC Berkeley Linguistics, accessed April 15, 2026, https://lx.berkeley.edu/phorum-2024
Vowels and Diphthongs in Sperm Whales - OSF, accessed April 15, 2026, https://osf.io/preprints/osf/285cs
Vowel- and Diphthong-Like Spectral Patterns in Sperm Whale Codas - MIT Press Direct, accessed April 15, 2026, https://direct.mit.edu/opmi/article/doi/10.1162/OPMI.a.252/133906/Vowel-and-Diphthong-Like-Spectral-Patterns-in
The phonology of sperm whale coda vowels - bioRxiv, accessed April 15, 2026, https://www.biorxiv.org/content/10.1101/2025.06.09.658556v1.full-text
6324 PDFs | Review articles in SPERM WHALE - ResearchGate, accessed April 15, 2026, https://www.researchgate.net/topic/Sperm-Whale/publications
Vowel- and Diphthong-Like Spectral Patterns in Sperm Whale Codas - ResearchGate, accessed April 15, 2026, https://www.researchgate.net/publication/397481767_Vowel-_and_Diphthong-Like_Spectral_Patterns_in_Sperm_Whale_Codas
Ronald L. Sprouse's research works | University of California, Berkeley and other places, accessed April 15, 2026, https://www.researchgate.net/scientific-contributions/Ronald-L-Sprouse-2268120499
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales - ResearchGate, accessed April 15, 2026, https://www.researchgate.net/publication/350992250_Cetacean_Translation_Initiative_a_roadmap_to_deciphering_the_communication_of_sperm_whales
Harvard Researchers on Speaking to Whales, accessed April 15, 2026, https://www.harvardmagazine.com/harvard-researchers-language-of-whales
SETI/CETI Tricorder Tech: Tapping Into Whale Talk - Astrobiology Web, accessed April 15, 2026, https://astrobiology.com/2025/12/seti-ceti-tricorder-tech-tapping-into-whale-talk.html
We Are One Step Closer to Understanding Whales. What Now? - Atmos Magazine, accessed April 15, 2026, https://atmos.earth/science-and-nature/we-are-one-step-closer-to-understanding-whales-what-now/
WhAM: Towards A Translative Model of Sperm Whale Vocalization - OpenReview, accessed April 15, 2026, https://openreview.net/forum?id=IL1wvzOgqD&referrer=%5Bthe%20profile%20of%20Shafi%20Goldwasser%5D(%2Fprofile%3Fid%3D~Shafi_Goldwasser2)
Project-CETI/wham - a Whale Acoustics Model - GitHub, accessed April 15, 2026, https://github.com/Project-CETI/wham/
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales - arXiv, accessed April 15, 2026, https://arxiv.org/pdf/2104.08614
New Methods for Whale Tracking and Rendezvous Using Autonomous Robots, accessed April 15, 2026, https://www.projectceti.org/blog-posts/new-methods-for-whale-tracking-and-rendezvous-using-autonomous-robots
Scientists Found Human Speech-Like Patterns in Sperm Whale Clicks - Science Alert, accessed April 15, 2026, https://www.sciencealert.com/scientists-found-human-speech-like-patterns-in-sperm-whale-clicks
Studying Sperm Whale Communication in the Caribbean - Carleton University, accessed April 15, 2026, https://carleton.ca/ci/sperm-whale-communication/
Sperm Whale Birth Recording Shows Caregiving Takes a Pod | AMNH, accessed April 15, 2026, https://www.amnh.org/explore/news-blogs/sperm-whale-birth
Individual, unit and vocal clan level identity cues in sperm whale ..., accessed April 15, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC4736920/
Behavior Communicating - Sperm Whales, accessed April 15, 2026, https://spermwhalesdominica.com/communicating/
Studies documenting rare sperm whale birth and ancient cooperative care released, accessed April 15, 2026, https://www.eurekalert.org/news-releases/1120823
Project CETI •-- News & Insights, accessed April 15, 2026, https://www.projectceti.org/news-insights
Whales speak in dialects and elephants have names for each other: The incredible secrets of animal language - BBC Wildlife Magazine, accessed April 15, 2026, https://www.discoverwildlife.com/animal-facts/animal-communication
Language-like efficiency in whale communication - PMC - NIH, accessed April 15, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC11797547/
MOTH x CETI: Exploring How Understanding Sperm Whale Communications Can Be a Force for Good, accessed April 15, 2026, https://mothlife.org/news/moth-x-ceti-exploring-how-understanding-sperm-whale-communications-can-be-a-force-for-good/



Comments