Normally when a human being learns a language he or she learns to speak and be spoken to. Sounds are produced and understood. An acoustic ability is acquired. But this is not always so: some people learn language (e.g. the English language) without the aid of sound. They neither hear sound nor produce it. Instead they rely upon vision and gesture (or writing). Their language ability is not notably inferior to those that speak and hear. What does this tell us about human language? What, in particular, does it show about the initial state of the human language faculty?
Presumably there is no analogous phenomenon in the case of other species that use language (or a communicative symbol system). A deaf and mute bird doesn’t cleverly exploit its eyes to substitute for hearing sounds, resorting to a sign language or the written word. Similarly for whales, dolphins, and bees. For these species if you can’t speak you don’t have a usable language (dance in the case of bees). The innate language faculty is specifically geared to speech—to a particular sensory-motor system. There is no flexibility in mode of expression and reception, unlike with humans. Does this mean that human language ability is intrinsically purely cognitive? Is speech just a learned add-on to innate linguistic competence? We learn to speak in a particular accent in a particular language, but this is not a matter of innate endowment—is it the same for the sense modality we adopt? Each of us could have learned to communicate by sight and gesture, and without much difficulty, so is the human language faculty neutral with respect to sense modality? Is it just a convention or accident that we end up speakinglanguage? Could it even be that our language faculty initially evolved as a visual-gestural system and only later became connected to our ears and vocal organs?  What if most people used a non-auditory medium for language—wouldn’t we then suppose that this is the “natural” way to communicate? We have chosen the acoustic route, but we could have gone visual without loss or inconvenience. Is the language faculty inherently indifferent to its mode of externalization? It certainly isn’t indifferent to syntax and semantics, but phonetics seems like one option among others.
It seems true to say that human language (unlike the language of other species) is more of a cognitive phenomenon than a sensory-motor one. For one thing, we use language in inner monologue not just in communication with others (I doubt this is true of bees and whales). The structure of language is a cognitive structure that can be present in a variety of sensory-motor contexts. But it would surely be wrong to suggest that we are not genetically disposed to speak: speech is biologically programmed in humans and it follows a fixed maturational schedule. Human speech organs are designed to aid speech; they are not just accidentally coopted for this purpose.  We don’t need to use these organs in order to master language, but it is surely natural that we do—it is certainly not a conscious choice! So is the human language faculty inherently acoustic or not? Neither alternative looks very plausible: it is possible (easy) to learn language without sounds and we are built to favor sounds. One might suppose that the case is somewhat like walking on the hands when born without functioning legs—an option of last resort. Do the deaf feel an inclination to speak and listen as infants, but find they cannot, and so resort to sign language? That doesn’t appear to be the case—they take quite naturally to the visual medium. There is certainly something modality-neutral about human language. On the other hand, we are clearly designed to speak—as we are not designed to play cricket.
Here is a possible theory: humans have two language faculties, one cognitive, and the other sensory-motor. Call this the dual capacity theory. Both are innate and genetically coded, but they can be disassociated, as they are in the deaf. We are familiar with the idea of distinct components in language mastery—the semantic component, the syntactic component, and the phonetic component—well, there are actually two linguistic faculties coded into our genes. This idea will not surprise those who favor the notion of a language of thought: this language might exist separately from our language of communication in our mental economy. They might not even have the same grammar. What the dual capacity theory suggests is that the faculty we use when we speak is itself divided into two—and the deaf use one of them but not the other. They use the same innate grammar as the rest of us, but they don’t use the same sensory-motor system (though there is no reason to deny that it is programmed into their genes). The eliciting or triggering stimulus for normal language development isn’t operative in their case, but they use exactly the same internal schematism. This explains why their language skills are comparable to the sound dependent, while not denying that speech is the natural human condition (in a non-evaluative sense). That is, we are born to speak, but we don’t have to in order to master communicative language. There are two separate psychological modules. It would be possible in principle to retain the sensory-motor module while lacking the cognitive module, so that articulate speech is possible but there is no real understanding of the principles of grammar (this would be like those “talking” parrots).  Thus there can be double dissociation. Quite possibly the two modules evolved separately: maybe the cognitive module initially evolved as an intrapersonal aid to thought, to be followed later by a communicative faculty that recruited the older faculty. We tend to speak of the language faculty, as if we are dealing with a unitary structure, but in fact there are two of them—there is more structure here than we thought. The cognitive faculty has nothing intrinsically to do with speech, though it obviously gets hooked up to speech during ontogenesis, while the sensory-motor faculty has everything to do with speech. No such duality obtains in the case of other linguistic species, which is evidenced by the fact that deafness spells an end to language ability for them. At its core, we might say, human language is not a sensory-motor capacity—though there is nothing wrong with saying that speech embodies linguistic competence. We really have two kinds of competence (and two kinds of performance): competence in the universal principles of grammar, possessed by the hearing and the non-hearing alike; and competence in the production and perception of speech. The former has nothing intrinsically to do with the ears and vocal organs, while the latter is dedicated to that sensory-motor system. When it is said that a language is a pairing of sound and meaning that is strictly speaking inaccurate (witness sign language), but it is true enough that the understanding of speech is such a pairing. Clarity is served by firmly distinguishing language and speech, but there is no need to deny that speech is the operation of a language faculty. To put it crudely, “language” is ambiguous.
The case might be compared to memory. We speak loosely of “the faculty of memory” but enquiry reveals that different things might be meant—there is not a single faculty of memory. There is long-term memory and short-term memory (and maybe others): these memory systems operate differently, permit of double dissociation, and no doubt have different genetic bases. Both are rightly designated “memory” and they have clear connections, with neither deserving the name more than the other, but they are distinct psychological faculties. Similarly, “language” applies to two psychological faculties, which can be dissociated, and which recruit different kinds of apparatus. When someone makes a general statement about “language”, we do well to ask him what human faculty he is referring to–speech or the more general capacity possessed by the deaf. Indeed, even that is too parochial, since we can conceive of language users who don’t have sight either but communicate by means of touch: they too have mastery of the grammar of human language (both universal and particular), but they don’t hear or see the words of language—they feel words (and cause others to feel them too). Their underlying linguistic competence is more “abstract” than any particular sense modality: but so is ours, despite our saturation in the acoustic. What is truly universal in human language is this abstract faculty that exists in people with different modes of expressing it—universals of speech are relatively confined.
Once we have made this distinction we can distinguish different domains of study: are we studying the universal abstract language faculty or are we studying its expression in specific peripheral sensory-motor systems? What is called “psycholinguistics” could be about either of these. Which properties of language belong to which faculty? No doubt the type of externalization will impose specific conditions on the form of what is expressed, but there will probably be universal patterns found across all modes of externalization (subject-predicate structure, say).  The temporal dimension of speech will affect its structure, along with the memory limits that accompany this, while the recursive property is likely to stem from the internal universal language. Combining phonemes is not the same as combining the lexical elements that constitute the common human language. Particularly intriguing is the question of maturation: do the two language faculties develop in the same way and at the same time? It could be that the internal language develops more rapidly and serves as the foundation for the development of speech (or sign language). It is not constrained by motor maturation and may be more “adult” than its external counterpart. If we think of language development as a process of differentiation, it may differentiate at a different rate from external speech—and proceed from a different basis. It may permit inner speech before the onset of outer speech. We certainly can’t infer its maturational schedule just by observing the growth of outer speech. With respect to evolution, it may be that the cognitive language faculty evolved much earlier than the vocal language faculty, which is thought to be relatively recent (about 200,000 years ago). We might have been using language for much longer than we have been speaking it. The larynx is a late accretion to language use, and a dispensable one. 
 It is a question how language-like the sensory-motor system would be without the backing of the cognitive system. Subtracting speech from the human subject leaves language intact, as shown by the deaf, but what if we subtract the internal language faculty from the activity of speech? Would we still have full productivity? Would grammar really exist for the sounds that emanate? This is an empirical question and not an easy one to answer. My suspicion is that we would get substantial degradation, but it may be that humans have evolved a good deal of autonomy in the speech centers of the brain, so that speech might exhibit many of the properties of the internal modality-neutral language faculty. Just as language ability is largely independent of general intelligence, so speech ability might be largely independent of cognitive-language ability. Certainly it is logically possible for there to be an autonomous faculty of productive grammatical speech in addition to a similar faculty for the inward employment of language—that is, one faculty for speaking and another for thinking in words. The question is like the question of how much of perception would survive without cognition.
 Chomsky makes this point. The internal language could be a lot simpler, structurally, than external speech, because of the constraints imposed by the sensory-motor system. There might be no gap between deep and surface structure in the internal language, with no transformations linking them.
 To simplify somewhat, there are three possible positions: language is only speech (traditional linguistics); language isn’t speech at all (Chomsky today); language is both speech and something else (an internal cognitive structure) (me). These questions remain murky and it is helpful to open up the theoretical options, though the speech-centric position is surely indefensible. (I’m grateful to Noam Chomsky for helpful comments.)