Karthik Durvasula: A definite problem with propriety

Preliminaries

Following Huddleston, Pullum, and Bauer (2002), I will use proper noun to refer to single word names, and proper names to refer to name phrases.

Throughout, I will ignoring the emphatic the, because that is a separate matter all together. Furthermore, I will mostly focus on the definite determiner, as the distribution of the indefinite is a slightly different. For example,

I saw the men.
*I saw a men

Next, a note about notation: I will use (*the) to mark cases where the sentence is unacceptable with the definite determiner, and *(the) to mark cases where the sentence is unacceptable without it.

Finally, what I will argue: at least, for the definite determiner facts, one needn’t invoke the concept of a “proper noun/name”. I will argue that it is best accounted for by lexically-conditioned allomorphy of the definite determiner. And while I don’t extend the argument to the other restrictions related to proper names/nouns, I would guess that the other facts similar fall out from other properties of proper nouns; if so, there is no need for the concept of proper nouns in natural language descriptions.

Some background

At a recent undergraduate conference, I noticed some weirdness related to the distribution of definite determiners in English, in the context of university names. I went down a rabbit hole over the last couple of days, and thought I’d write down what I have so far.

Let’s start with what will be a very rough first approximation related to the generalisation about the restriction on (definite) determiners in English: Names don’t take determiners, but common nouns do. And this is roughly what Chomsky (1965) must have in mind when he said proper nouns are “Nouns with no Determiners” (p. 100). Surely, this definition can’t be extended to other languages, where names of individuals routinely take determiners, but we will start at the beginning as responsible linear folks.

I saw (*the/*a) Karthik Durvasula
I saw a/*the man.

In contrast to the above, car names routinely take determiners.

I saw the/a Honda Civic
I saw the/a Toyota Camry

But one could argue that car names are not really proper names, but just common nouns. A version of this has been standard since at least Sloat (1969), who argued that trade names (such as Coca Cola, Purex,…) are common mass nouns because they don’t take determiners in the singular and are compatible with “low-stress some”, and they lose their semantic massness when pluralised.

He drank Coca-Cola
He drank some Coca-Cola
He drank two Coca-Colas.

But, the mass attribute doesn’t apply to all such trade names, as is the case with car names, which appear to be count nouns. So, trade/car names as a class could be common nouns, by this logic, because they pattern as common nouns.

Furthermore, as known since at least Sloat (1969), plural proper names behave regularly with respect to determiners, in that they allow (must have?) determiners.

I saw the Fockers.

Based on such facts and the facts immediately below, Sloat (1969) argues that,

“[t]he definite article will appear as zero before singular proper nouns, except when it is heavily stressed or they are preceded by restrictive adjectives or are followed by restrictive relative clauses”. (p. 28)

That’s not (*the) Fabian.
That’s not THE Fabian.
That’s not the famous Fabian.
That’s not the Fabian who sings.

Thus, with an Abnerian DP structure (Abney 1987)¹, a proper noun would have the following structure:

[Ø_D-Definite [ Name _NP] _DP]

One can ask, does this mean “proper name” or “proper noun” is part of the ontology of natural language? That is, is there some representation or structure or predicate unique to proper names such that it can be referred to as part of natural language phenomena?

So, let’s see if we can create a natural class that refers to proper nouns more precisely. This could logically be a phonological, morpho-syntactic or semantic natural class. If there is no such natural class, then one could (must?) conclude that the appearance of the definite determiner in English is perhaps lexically-conditioned allomorphy.

A phonological natural class of proper nouns?

It’s trivially obvious that any such natural class for proper nouns/names can’t be based on phonological characteristics. Both of the sentences below have the pronunciation [mɑɪk], but the first is a proper noun and the second isn’t. It’s easy to construct gazillions of such pairs in English and other languages. So, trying to describe proper nouns/names in terms of a phonological natural class is a non-starter. And I don’t believe anyone has been brave (or stupid) enough to try this either for exactly this reason.

Mike is big.
The mic is big.

A morpho-syntactic natural class of proper nouns?

In theory, it is possible that there is a functional projection (perhaps, a name-phrase) or a separate morpho-syntactic feature for proper nouns. But, to my knowledge, there is no language that has a special morpheme that is used only with names. Maybe, others would disagree, but it is difficult to see how a proper name could be part of the ontology of morpho-syntax as either a separate functional projection or as a feature, if there is no clear morpheme that marks it in any language.

A semantic natural class of proper nouns?

OK, let’s try a semantic definition. A proper noun is a morpheme or word that refers to a unique entity in this world.² Since, car names describe a class/type of vehicle, i.e., they refer to a set of entities whose cardinality is more than 1, they are not a proper name/nouns, by this definition.

However, there is a complication that arises if we hold the above semantic definition of proper nouns. There are unique entities referred to by some words/names, which require a definite determiner, unlike other proper names/nouns. For example, “Tucker 48” was a prototype that never went into production, as per this Wikipedia page. There was only a single version of it (hence, a prototype). Crucially, it has to have a definite determiner in front of it. OK, maybe the game we have to play is one of possible worlds, and say that there are possible worlds where the Tucker 48 refers to more than one car or other cars. So, it’s not really a proper noun, which in contrast would have a unique (and identical) referent in every possible world that the object exists in (Kripke 1980).

15. I saw *(the) Tucker 48.

At this point, one should ask what the heck is possible for a “possible world”, if I can so cavalierly call it to my rescue? Are there any restrictions on what possible worlds are? In fact, as Heim and Kratzer (1998) mention,

“Scholars differ with respect to their views on trans-world identity of individuals. David Lewis has argued,³ for example, that individuals should not be assumed to exist in more than one possible world. Individuals may be related to very similar individuals in other possible worlds via a counterpart relationship.” (fn. 11 on p. 312)

To me, the concept of possible worlds sometimes feels very uncomfortable, and I wonder if it’s too rich a view. But very likely, this is my ignorance talking. Let’s put the worry about what possible worlds are aside, and say for now that a semantic definition of proper names/nouns in terms of a unique (and identical) entity across possible worlds is correct.

So, returning to the determiner pattern in English, we have the following generalisation: singular morphemes/words that refer to a unique (and identical) entity in this world (and every possible world that it exists in) don’t take determiners (if not preceded by an adjective or followed by a relative clause), but others can.

A trip down the rabbit hole

We now come to the really interesting cases: university names. Let’s start with the easiest cases. The following behave as expected — they refer to unique entities, and they disallow the definite article. That is, they behave like your vanilla proper names.

I saw (*the) Harvard University.
I saw (*the) Michigan State University.

But, the following two require the definite determiner.

I saw *(the) University of Michigan.
I saw *(the) University of Massachusetts.

In fact, a quick check will show that, in general, “_____ University” doesn’t take a definite determiner, but “University of _____” requires it.⁴ There are even some that alternate based on which construction they are in:

I saw *(the) University of Oxford.
I saw (*the) Oxford University.
I saw *(the) University of Cambridge.
I saw (*the) Cambridge University.

I am going to call this the University Definite Determiner Alternation (UDDA).⁵

The immediately obvious question is: why is there such an alternation? Maybe, it is the fact that “University of _____” has a more complex/compositional structure in a way that “_____ University” doesn’t. Perhaps, the second is behaving differently because it is a compound. Here, one could make an appeal to lexicalist hypotheses and say that if it is a proper name that is a compound, then it is syntactically non-compositional (albeit, morphologically compositional), and therefore, somehow more name-like and so it doesn’t take a definite article; but, if it has an of phrase, then it is somehow phrasal or is forced to be phrasal because it can be, and phrasal names allow for determiners. The if-it-can-be-phrasal-then-it-must-be-phrasal hypothesis.

There is some evidence in favour of this hypothesis. If you add a relative clause, then suddenly a definite article is needed in “_____ University”. So, one could say that if the compound has a relative clause, then it is no more a simple noncompositional structure in the syntax, but is instead phrasal, and so, there must be a determiner in front of it.

I saw *(the) Harvard University that you told me about.
I saw *(the) Oxford University that you told me about.

Similar to the above, the definite article is also realised when there is an adjective before the compound.

I saw *(the) wonderful Harvard University.
I saw *(the) famous Oxford University.

Theoretical worries to address

There are theoretical problems with invoking the distinction between compounds and phrases to account for this. First, it is not clear why compound names should resist determiners but phrasal ones shouldn’t, even if the former were non-compositional in the syntax — it is at best a description awaiting an explanation.

Second, the lexicalist hypothesis has been questioned by many espousing the view that the syntactic component is the single generative engine of structure in natural language (Halle and Marantz (1994) in Distributed Morphology, Starke (2009) in Nanosyntax, Collins and Kayne (2023) in Morphology as Syntax, Bruening (2017) in Consolidated Morphology,…). So, if both compounds and phrases are all (complex) structures, then what is the difference? Maybe, compounds introduce phases and the relevant phrases here don’t? But, then what’s happening with relative clauses? The embedded relative clause is a tensed CP, and a phase too, and if one says, aha, but the head of the relative clause isn’t part of the phase, then we are back to asking why the head compound doesn’t introduce a phase but a compound by itself does.

Empirical worries to address

There are also empirical problems with the generalisation. The first empirical issue is one of showing evidence that the “_____ University” cases are syntactically non-compositional while the “University of _____” are. A potential test to see the structure is the one-replacement test. Bermúdez-Otero (2019) argues that we can use a one-replacement test to argue that what were previously thought to be bracketing padoxes have vanilla non-paradoxical structures. For example, the phrases “nuclear physicist” and “transformational grammarian” have the following structures in standard work on the topic.

[nuclear physic]ist
[transformational grammar]ian

That’s why these are called bracketing paradoxes — as the class 1 suffixes -ist and -ian are attached to compounds (which are created later in lexical phonology). However, he argues that one-replacement is possible for the second member, as in the example from p. 16 of his handout.

He is a generative grammarian, but not a transformational one.
He is a nuclear physicist, not a quantum one.

Note: Such one-replacement sentences in the case of bracketing paradoxes sound unacceptable to me for the relevant meanings of “transformational one” and “quantum one”, but I will play ball with his judgements for now.

Based on such judgements, Bermúdez-Otero (2019) argues that the correct structures are,

[nuclear] [physicist]
[transformational] [grammarian]

For our purposes, if the one-replacement test works in such cases, it shows that the second member of the compound forms a separate NP constituent. So, we can ask if the same thing happens with the “_____ University” phrases we care about here. For example, can we do this?

I saw Harvard University but not the Oxford one.
I saw Harvard University, Princeton University, and Oxford University, but none of them is a truly conservative one.

They all sound terrible to me. OK, maybe, the second one isn’t that bad. But, if the above sentences are acceptable to others, then perhaps it shows that there is a phrasal structure in such “compounds”, too.

The second empirical issue is the following. Imagine “University of _____” was the name of a movie. Then, we get the null allomorph.

I saw (*the) University of Oxford

The third empirical problem with the compound vs. phrasal appeal is that there are other name compounds that routinely need a determiner. These are the ones that are “_____ Center”.

I saw *(the) Kennedy Center.
I saw *(the) Kennedy Space Center.
I saw *(the) Lincoln Center.

With respect to the above, we are now in a position to ask, which ones the weird ones? “_____ University” or “_____ Center”. There are other cases that behave as “_____ University” do. For example, “_____ Square” below.

I saw (*the) Trafalgar Square.
I saw (*the) Leicester Square.
I saw (*the) Times Square.
I saw (*the) Copley Square.

But, there are others that behave like “_____ Center”. For example, “_____ Institute”, “_____ Gallery”, “_____ Museum”, “_____ Tower”.

I saw *(the) Tate Gallery.
I saw *(the) British Museum.
I saw *(the) Guggenheim Museum.
I saw *(the) Neon Museum.
I saw *(the) Georgia O’Keeffe Museum.
I saw *(the) CN Tower.

For the CN tower example, my Indian English judgements allow both for the above, actually. In fact, I prefer the determiner-less variant in casual speech.

Perhaps, the clearest problematic case is “_____ Institute”. Institutes are a close parallel to universities, but they don’t have the UDDA, as can be seen below. So, it is unclear if there is any reasonable semantic way in terms of locations/places/… to cut the pie at this point.⁶

I saw *(the) Smithsonian Institute
I saw *(the) Praat Institute
I saw *(the) Erikson Institute
I saw *(the) Virginia Military Institute
I saw *(the) Worcester Polytechnic Institute
I saw *(the) Howard Hughes Medical Institute
I saw *(the) Defense Language Institute

Note, “Institute of _____” behaves like the parallel university structure in requiring an overt definite determiner.

I saw *(the) Institute of Advanced Study
I saw *(the) Institute of Peace

Finally, with some, it looks like you can get both in American English. But, with these, I can’t tell if it is my Indian English judgements allowing for the determiner-less sentences here. That might very well be the case.

I saw/went to (the) New York Institute of Technology
I saw/went to (the) New Jersey Institute of Technology

Summary for the UDDA

I don’t believe there is any systematic pattern with the UDDA. Specifically, the appearance of the null allomorph of the definite determiner seems to be lexically-conditioned allomorphy of the definite determiner. To the extent that lexically-conditioned allomorphy is strictly structurally local (see Kalin (2017) for recent discussion), it allows us to immediately understand why the definite article is realised when an adjective or a relative clause is present. It’s not that such structures are phrasal, per se, that causes them to behave normally. Instead, it is that the structural locality to trigger the null allomorph is lost, so the default overt definite determiner allomorph the appears in such contexts.

I saw *(the) Harvard University that you told me about.
I saw the famous Harvard University.

Returning to names more generally

Once we allow ourselves to say that the null allomorph in UDDA is lexically-conditioned, there is nothing stopping us from extending the same analysis to all proper names/nouns. As has been known for a long time, many proper names/nouns appear with an overt definite determiner. For example, with some famous places, you must have the definite determiner even when it is a single morphologically simple name. By our analysis, there is nothing to be said about these, as this is the default allomorph.

I saw *(the) Guggenheim
I saw *(the) Louvre
I saw *(the) Smithsonian

Note, the default allomorph also happens with non-place names.

I saw *(the) Mona Lisa.

For such cases, Chomsky (1965) suggests that:

“[I]n the case of”The Hague,” “The Nile,” with a fixed Determiner that may just as well be taken as part of the Noun itself, rather than as part of a freely and independently selected Determiner system.” (p. 100)

Note, Huddleston, Pullum, and Bauer (2002) (p. 517) call such names weak proper names.

First of all, if we maintain the Chomsky (1965)’s single-word analysis for such proper names, then the if-it-can-be-phrasal-then-it-must-be-phrasal hypothesis cannot be maintained for the UDDA in the first place; since, by that hypothesis, “The Hague” and “The Nile” must be phrasal.

Next, it is possible to argue that the determiner is present as a separate morpheme in such proper names, and it not just part of the proper noun. If the definite article were syntactically a single item along with the name that followed (“the noun” in Chomsky (1965)’s terms), then one should be able to get a separate definite determiner in relative clauses as is the case with other names (shown above). So, the following sentences should be acceptable with an additional definite article. On the other hand, if there is a decomposition in such words, and a definite determiner is really present in such names, then one explains the gap with nothing else.

I saw the (*the) Nile that you told me about.
I saw the (*the) Guggenheim that you told me about.

Second, as pointed out in Huddleston, Pullum, and Bauer (2002) (p. 517), such names also “lose” the definite article when they are used as modifiers.

I saw a (*the) Nile crocodile.
I saw a (*the) Guggenheim ticket.
I saw a very different (*the) Thames from the one I remembered from my youth. ⁷

Third, when there is a preceding adjective, again an overt definite determiner is not required.

A dangerous (*the) Nile crocodile passed before us.
I found a special (*the) Guggenheim ticket.

Here it is worth pointing out that Huddleston, Pullum, and Bauer (2002) that

“[i]t is virtually impossible, for example, to drop the article from the Hague: ∗two Hague councillors, ∗an impressively modernised Hague.” (p. 517)

But, looking online, “a Hague Convention” is not uncommmon. In fact, this Google trends page shows it to be not too far behind and pretty well correlated with “the Hague Convention”. Furthermore, if I were to make it an adjective, to me, it is clearly “a Haguean”, so I can get sentences such as,

I am aware of a (*the) Haguean Conspiracy.

So, perhaps, “the Hague” resists some alternation with the indefinite, but it falls in line with the rest for many cases.

Fourth, in plurals, we get the overt definite determiner, because that is the default anyway. Crucially, all four of the above facts parallel the UDDA and should really be captured in the same way. That the lexically-conditioned null allomorph is blocked when the triggering context is not structurally local.

It gets more interesting. With some proper names, the determiner is not possible, but if the names are turned into initialisms or acronyms, then a definite determiner is often the null allomorph (in some contexts).

I saw (*the) NYIT
I saw (*the) NJIT
I saw the Howard Hughes Medical Institute
I saw (*the) HHMI
I believe in *(the) North Atlantic Treaty Organisation
I believe in (*the) NATO

But, even this is not true for other parallel cases. Again, this is expected, if the null allomorphy is lexically conditioned.

I believe the United Nations.
I believe the UN.

Note, familiarity, a usual trigger for the use of definite determiners (see Abbott (2006) for discussion) is not going to work as a dividing line when we look at NATO vs. the UN. The choice of overt vs. null seems to be arbitrary in this case.

Finally, as with the car name issue, if there is more than one branch/location of the school/university, then welcome back determiners, in full force! Note, IIT is a common initialism for the Indian Institute of Technology.

I saw the/an IIT.

Summarising the analysis

At this point, the only analysis I can see is that the null allomorph of the definite determiner appears in an arbitrary, lexically-conditioned fashion. That’s it, there is no real constraint against names and determiners in English. Furthermore, the overt allomorph “the” is the regular/default allomorph.

Structure: [D₀ [proper noun _NP] _DP]
D_definite -> Ø / {Nile, Harvard University, Oxford University, ….}
D_definite -> the [elsewhere]

So, any correlation we see with the null determiner and proper names/nouns is just that, a correlation, and not part of the grammar as such (except through lexical-conditioning). So, proper names/nouns are a morphological class, like gender or Bantu noun classes. They have some class resemblances in real world features, and maybe some of these cases can be condensed into more succinct cases,⁸ but they are ultimately arbitrary classes. Furthermore, given the class resemblances, they can incorporate new coinages too (much like gender).

All of this is actually good news, because other languages allow for determiners with proper nouns. So, English behaves likes every other language, and there is no deep grammatical distinction (either syntactically or semantically) between proper nouns and common nouns.

So, at least based on the patterns considered in this post, we don’t need any proper name specific phrases. We also don’t seem to need the predicate “is a unique entity in this (of every possible) world” — yes, such a predicate can be used to logically describe proper nouns/names (at least, to a first approximation; but, see the problem of Tucker 48 raised above); however, the predicate doesn’t seem to be doing any linguistic work for us.

Final thoughts and questions

The most interesting cases are those where an indefinite determiner can appear before some proper names.

A stupid Karthik is a happy Karthik.

If Karthik refers to a unique entity, then it isn’t clear why it can be preceded by an indefinite only because an adjective appears. Is the adjective somehow forcing (coercing?) it to be non-unique or non-familiar?

There seem to be a lot of interesting and puzzling aspects of definites and indefinites (Abbott 2006). Maybe, I’ll work on that topic later in my career :).

Abbott, Barbara. 2006. “Definite and Indefinite.” Encyclopedia of Language and Linguistics 3 (392): 99.

Abney, Steven P. 1987. “The English Noun Phrase in Its Sentential Aspect.” PhD thesis, Massachusetts Institute of Technology.

Bermúdez-Otero, Ricardo. 2019. “Challenges to Stratal Phonology.” University of Leipzig Brugmann Fellow Lecture.

Bruening, Benjamin. 2009. “Selectional Asymmetries Between CP and DP Suggest That the DP Hypothesis Is Wrong.”

———. 2017. “Consolidated Morphology.” Ms., University of Delaware.

Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Cambridge: MIT Press.

Collins, Chris, and Richard S Kayne. 2023. “Towards a Theory of Morphology as Syntax.” Studies in Chinese Linguistics 44 (1): 1–32.

Halle, Morris, and Alec Marantz. 1994. “Some Key Features of Distributed Morphology.” MIT Working Papers in Linguistics 21 (275): 88.

Heim, Irene, and Angelika Kratzer. 1998. “Semantics in Generative Grammar,” Blackwell textbooks in linguistics,. https://books.google.com/books?id=jAvR2DB3pPIC.

Huddleston, Rodney D, Geoffrey K Pullum, and Laurie Bauer. 2002. The Cambridge Grammar of the English Language. Cambridge University Press Cambridge.

Kalin, Laura. 2017. “The Ins and Outs of Allomorphy in Turoyo (Neo-Aramaic).” Talk Presented at Generative Linguistics in the Old World 41. https://static1.squarespace.com/static/586960e5197aea52834230a2/t/5a481bb241920273ab6a79bd/1514693232371/Harvard-Turoyo+allomorphy.pdf.

Kripke, Saul A. 1980. Naming and Necessity. Vol. 217. Springer.

Lewis, David K. 1968. “Counterpart Theory and Quantified Modal Logic.” The Journal of Philosophy 65 (5): 113–26.

Sloat, Clarence. 1969. “Proper Nouns in English.” Language 45 (1): 26–30. http://www.jstor.org/stable/411749.

Starke, Michal. 2009. “Nanosyntax: A Short Primer to a New Approach to Language.” Nordlyd 36 (1): 1–6.

But, see Bruening (2009) for arguments to the contrary.↩︎
A similar definition in terms of phrases can be extended to proper names, too.↩︎
See Lewis (1968).↩︎
Given this, Ohio State’s obsession with including the in its name is hilarious. They would have got it for free without looking like jerks even if they hadn’t insisted on it.↩︎
As in, this is utter insanity.↩︎
Some institute names were from this Wikipedia article.↩︎
Sentence modified from an example in Huddleston, Pullum, and Bauer (2002) (p. 517) to maintain the structure used throughout↩︎
For example, instead of listing Harvard University, Oxford University,…, perhaps, there is a structural position that the morpheme university appears in such cases that can be relevant for the leically-conditioning allomorphy.↩︎

A definite problem with propriety