The Hexacoto

Listening to the sound of one hand clapping

Tag: linguistics

On copyrights and conlangs

One would imagine that if someone invented a language for use in a creative work, that someone could retain copyrights to the use of the language, would he or she not? According to U.S. case law, that might not be so, in the opinions of the Language Creation Society (LCS), a nonprofit organisation created for the promotion and discussion of constructed languages (conlangs). LCS submitted an amicus curiae — unsolicited advice to the courts for pending cases — to the Paramount v. Axanar case, where Paramount Pictures sued Axanar Productions for infringing various parts of its intellectual property over a fan-produced film Prelude to Axanar. These include: Vulcans, likeness of Vulcans, Romulans, uniforms with gold shirts, triangular medals, etc etc., and The Hollywood Reporter has more information on how Axanar is planning to respond to each claim.

What caught my interest was that Paramount laid claim to the Klingon language, which got me thinking: “Can conlangs created for the purpose of creative works be copyrighted?” Paramount has claimed the copyright to the Klingon language, its vocabulary and its graphemes but LCS believes that its claims will not hold in the court of law — indeed, anyone can claim copyright to anything but it is only when copyrights are challenged in courts that they are found to be valid or otherwise.

Citing an 1879 case Baker v. Selden, LCS said that while reproductions of creative works are protected by copyright laws, no laws exist to prevent individuals from using systems and creating their own derivative works with it. Baker v. Selden was a case where Selden created a system for bookkeeping and wrote books about it, hoping to sell the ideas to counties and the government. However, he was unsuccessful. Later on, Baker produced a book on bookkeeping that had systems very similar to Selden’s. Selden’s estate sued Baker, but the courts ruled that:

“[W]hilst no one has a right to print or publish his book, or any material part thereof, as a book intended to convey instruction in the art, any person may practice and use the art itself which he has described and illustrated therein… The copyright of a book on book-keeping cannot secure the exclusive right to make, sell, and use account books prepared upon the plan set forth in such a book.”

Thus Klingon as a language cannot be copyrighted, especially given its status where not only its creator Marc Okrand but also various scholars have contributed to the development and expansion of Klingon. Upon the establishment of linguistic rules of Klingon — syntax, phonology, morphology, etc — the language takes a life of its own and it would be unthinkable that a copyright holder could lay ownership to all and any subsequent derivative works from using that language. LCS cites the translated works of Shakespeare’s Hamlet and the Epic of Gilgamesh into Klingon, not to mention the technically somewhat-valid native speaker of Klingon, Alec d’Armond Speers, as proof that Klingon is a robust-enough language and trying to enforce copyrights to it would be unreasonable. It would be akin to enforcing a copyright on Esperanto today.

Moreover, the movie Prelude to Axanar isn’t ripping Klingon dialogue verbatim in their fan film, but devising dialogue based on using the Klingon language system. Indeed, besides Baker v. Selden, LCS is citing many other case laws to their effect that Klingon is a free language that cannot be copyrighted, and if anyone’s interested should check out the amicus brief (linked at the top of this post and also embedded below). The brief is also replete with lots of Klingon phrases written in Klingon script and accompanying footnotes, which I think is hilarious but probably not very amusing to the presiding judge.

So what does this mean? Feel free to write your fan-fiction in Quenya, Sindarin, Klingon, Na’vi or even that Atlantean language that Marc Okrand also created for the Disney movie Atlantis: The Lost Empire. As long as one isn’t merely ripping off dialogue (reproduction) but can prove that the utilised language is a product of using the language systems to derive original work, copyrights to the language are unenforceable. Granted, this is merely my opinion, but other folks have voiced similar sentiments (I liked this and this from LCS member Sai), and they’re worth checking out if you’re interested in copyrights and conlangs. We’ll see May 9th how the courts rule, and hopefully it’ll be in the favour of language hobbyists and enthusiasts.

Notes on how to pronounce Malay words in ‘Semoga Bahagia’

Singaporean musical group The TENG Ensemble did a cover of one of my favourite childhood songs, “Semoga Bahagia,” which translates loosely from Malay into “wishing you happiness” or according to Wikipedia, “May you achieve happiness.” Their use of traditional Chinese instruments for the song makes for a wonderful arrangement, led by local indie singer Inch Chua, who sounds great.

However, and some have noticed and commented on the ensemble’s Facebook post about the video, Chua does not quite get the pronunciations of the words right. Chua’s pronunciation is a highly Anglicised/Americanised vowel/consonant map, informed by her Chinese background. If the TENG Ensemble ever wishes to redo their video — and I think they’ve indicated their interest in doing so — here’s my notes on how to get the Malay words right. An understanding of IPA transcription will help in understanding this post better but even if not, I’ll try to transcribe it in an easily understandable format.

Note 1: The Malay language uses flapped/tapped r’s (/r/), which is similar to the r’s in Spanish, Japanese, and many other languages in the world. It does not use the rhotic r (/ɹ/) English uses.

Note 2: Vowels in spoken Malay tend to be preserved in their lengths and rarely shortened unless spoken very fast. Therefore words like “jiwa” should sound like “jee-wah” (/dʒiwa/) rather than “juh-wah” (/dʒɪwa/). When singing, it’s especially important to preserve the vowels since they become very apparent when shortened.

Note 3: Malay does not usually do aspirated consonants. There is strong aspiration in Chua’s d’s in “pemudi-pemuda,” which is how we usually pronounce d’s in English. Thus the Malay “d” sounds different from the way one would pronounce “dog” in English, which has an audible breathy release in the initial consonant. (Contrast the d consonant in “dog” vs. “dandan 淡淡”)

Note 4: In Malay orthography, “ng” is the velar nasal (/ŋ/), even if it’s between two vowel. Thus “dengan” is “duh-ng-an” and not “deng-gan.” (/dəŋan/) Same with “c” it’s a postalveolar affricate (/tʃ/) as in “ch-urch” and never an “s” sound.

And now for the second-by-second analysis! My comments will be in the form of (observation); followed by suggestion if applicable.

  • Pandai cari [0:35] — rolled r; do flapped r instead.
  • pelajaran [0:39] — rolled r; do flapped r instead.
  • jaga diri [0:46] — rolled r; do flapped r instead.
  • kesihatan [0:49] — Chua said kAH-see-ha-tan; kuh-see-haa-tan, preserve e vowel, don’t aspirate t.
  • Serta sopan santun [0:53] — mispronounced, rolled r; SER-TA sopan santun.
  • dengan [0:58] — good job on de-NG-an!
  • bersih serta suci [1:23] — rolled r; do flapped r instead, preserve all r’s.
  • hormat dan berbudi [1:27] — rolled r; do flapped r instead.
  • jaga tingkah [1:29] — k consonant dropped; preserve the k sound in tingkah, but lightly released.
  • Capailah [1:38] — diphtong aɪ changed to vowel a; don’t drop the i in ca-pAI-lah.
  • pemudi-pemuda [1:42] — aspirated d’s; don’t aspirate d, especially audible in pemuDA.
  • kita ada harga [1:48] — rolled r; do flapped r instead.
  • di mata dunia [1:50] — good job on the d! This is the example of the unaspirated d.
  • kalau kita [2:10] — a vowel changed to e (/a/ to /ə/); preserve A vowel, kAH-lau instead of kUH-lau.
  • lengah [2:12] — added a g consonant; there is no g consonant, it’s pronounced le-ng-ah, not len-gah.
  • serta [2:13] — rolled r; do flapped r instead.
  • hidup [2:15] — Chua said he-daap (/hidap/); it’s pronounced he-doop (/hidup/)
  • sia-sia [2:16] — Chua said saya-saya (/saɪya-saɪya/); it’s see-ah see-ah (/sia-sia/)
  • jiwa [2:19] — Chua said juh-wah (/jəwa/); preserve all vowels, it’s pronounced jee-wah
  • besar sihat serta segar [2:19] — rolled r; do flapped r instead.
  • dengan (2:23) — good example of dengan.
  • perangai pemudi [2:28] — rolled r, aspirated d; do flapped r instead, don’t aspirate d.
  • cergas [2:31] – Chua said sergas, rolled r; it’s pronounced CHeRgas, ch consonant, do flapped r instead
  • suka rela [2:35] — rolled r; do flapped r instead.
  • berbakti [2:37] — rolled r; do flapped r instead. This word is also an example of unreleased ‘k’
  • sikap yang pembela [2:38] — vowel was changed to p-uhm-bUH-la (/pəmbəla/); preserve vowel, it’s p-uhm-bAY-la (/pəmbela/).
  • berjasa [2:41] — rolled r; do flapped r instead.
  • capailah [2:44] — see before
  • pemudi-pemuda [2:47] — see before
  • rajinlah supaya berjasa [2:52] — rolled r; do flapped r instead.
  • semoga bahagia [2:56] — bahagia was pronounced as bUH-ha-g-i-a (/bəhagia/); preserve the A vowel in “ba,” making it bAH-ha-gi-a (//bahagia/). It’s ok to break “gia” into “gi-a” for stylistic purposes but “ba” should remain “ba.” In Malay, “be” (/bə/) and “ba” (/ba/) are contrastive.

Hey, no one ever said Malay was easy, right?

Here’s an example of the song sung by a Malay person (I’m assuming) with all the right consonant sounds, although I think it’s interesting when he overdoes some of his r’s and turns it into a trill [0:33].

It’s a lovely touch that gives it a very folksy flavour, that I’ve heard sometimes older Malay Singaporeans do than younger ones. It’s not quite dissimilar in Japanese, where the flapped r can become a trilled r, called makijita (巻き舌 or rolled tongue), and it is sometimes associated with rural communities, and — interestingly — with the Yakuza and in war cries (“uorrrrrrryaa!”). Pay attention to how he pronounces his d’s, r’s, t’s, and k’s.

Hopefully this will help the TENG Ensemble and Inch Chua make a better second version of the song hitting all the right sounds of the wonderful Bahasa Melayu. After all, nobody wants to hear an ang-mor-cised version of our Chinese songs, do we?

 

(Woe…. da…. zheeeah.. gay woe! Eee shwaang zheeaan ding zhhh paaang….!!!)

So is Zuckerberg fluent in Chinese or not?

Image: Tsinghua University

Image: Tsinghua University

Mark Zuckerberg recently made news because of a dialogue he gave at Tsinghua University in Beijing, China, where he spoke (Mandarin) Chinese. Here’s the link to the full video, 30 minutes long.

There have been news coverage about Zuckerberg’s Chinese language skills, from laudatory “Of course he speaks fluent Mandarin” headlines to less-than-favourable comparisons to that of a seven-year old. It turns out that there is a problem with the mixed coverage of Zuckerberg’s student dialogue — the news and the public often get confused what language “fluency” is.

Zuckerberg’s spoken Chinese, while mostly coherent, and with some interruptions, mangled most of the tones that the Chinese language is widely known for. BBC’s coverage pointed out that Zuckerberg’s failure to properly produce tones led him to claim that Facebook had just “11 mobile users” instead of “one billion.”

To understand how “fluent” one is when talking about mastery of a foreign language, one needs to understand the difference between “competence,” “performance,” “fluency,” and “proficiency” — four terms often used in such discussions. I’ll attempt to explain the differences between these four concepts.

Language competence and performance are the two biggest things that people are actually talking about when discussing a person’s L2 (foreign-language) “fluency.” Simply put, competence is a person’s grasp on the language’s grammar, phonological/phonetic rules, etc. Thus a person can be completely competent in a language but unable to perform as well. An example would be a person completely able to understand and speak Spanish, but unable to roll one’s R’s.

The distinction between competence and performance was notoriously made by Noam Chomsky in 1965. However, Chomsky’s notion of grammatical and phonological (linguistic) competence was expanded by Dell Hymes in 1966.  It included the knowing of appropriateness of topics and politeness (sociolinguistic), understanding how to combine language structures into different oral and written types (discourse), and knowing how to repair communication breakdown in the presence of interference (strategic). These concepts came to be known as “communicative competence” in literature.

Zuckerberg’s Chinese performance with his tones was bad, but he understood most of the questions asked. He was also able to answer simple to moderately complex questions in grammatically-sound Chinese. His use of humour and appropriate politeness further signals competence. It can be said that he’s mostly competent in Chinese.

Fluency is the measure of the ease of production of the language. It can be measured by speed, sustainment, and/or lack of breaks. Fluency includes not only speech fluency, but reading, writing, and even listening. Thus a person could be illiterate and have a limited vocabulary, but can be considered fluent if the speech occurs at a smooth pace. Zuckerberg’s speech does contain many pauses in parts of his speech, to the point that it might be hard for the listener to understand what he’s trying to say. He may not be very fluent, but through context of his speech, listeners (especially native speakers) are able to repair the content of his speech mostly.

Lastly, proficiency is the mastery of how well one uses the language, and can usually be tested by means such as the TOEFL or JLPT. One thing that was pointed out to me is that in language testing, proficiency is usually norm-referenced; that means that test takers are tested on how well they did in comparison to other test takers. This is different from criterion-referenced tests, such as the GCEs and GCSEs, where test takers are measured if they meet a set of pre-defined criteria. Once again, a speaker can be fluent in the language, but not necessarily proficient.

Bot or Not? Poetry does not compute

compupoem

Passing Time
Your skin like dawn
Mine like musk

One paints the beginning
of a certain end
The other, the end of a
sure beginning

Do you think the above poem is written by a human, or generated by a computer program? That’s what site bot or not seeks to achieve: a Turing test where people try to discriminate computer-generated poems from human-written ones. By the way, the above poem, “Passing Time,” was written by a human Maya Angelou.

I have previously written about reconciling the idea of programming code as poetry, whether it’s possible to achieve “poetic beauty” in code. I posited that there might possibly be an inherent “poetic beauty” in poetry that we recognise when we decide a piece of writing is a poem. With bot or not, the idea of what makes poems poetry is taken in a different step: by identifying whether the “poems” are human or computer-generated, there must be something common with human effort that is visibly distinguishable.

That means that any piece of writing that is seen as containing enough of the essence of “poetic beauty” to be counted as a poem, can be further sub-divided into being human effort or algorithm-derived. There must be commonality between “computer poems” and “human poems” that we can decisively say, “Yes, this is the work of a human” or “This is clearly computer-generated.”

The Showcase

The site lists some of the top poems easily recognised as human or computer-written.

Poem

Generated by botpoet using JGnoetry (93% said “bot”)

Published on desserts and from pink. Symptoms
Start, 2013 as other poetry does anyone
Word in thailand and write reading one mother
Order they deserve. Well, 2013 recently released
My pants and serve throw from is a beautiful
Insane surreal once hours playing once.

Personal Space

Generated by botpoet using JGnoetry (87% said “bot”)

New poetry I might help you currently
Have been, snapping male, but it’s a rocket,
Kid man been forever idea of
My wearing a punk and brought old robot
Smog thing. Professional grown-up looking
In dusty old men remaining his tibia.

Bright

Generated by Jim Carpenter using Erica T Carter (85% said “bot”)

The name ghosts second, destroying.
The quite normal letter to the dutch throne after one year destroying the still stuck ridged snow, interpreted.
Undimmed radiance curves, primming like the column.
Lounging, as other as very high score.
Loafing a booby.
Lounging and spends.
Normal occasion joints.
Early letter gets the personal experience from the individual.
Getting, tarries however in cell.
Lounging.
Obsesses the disturbed surface.
Obsesses the abyss.

As can be seen, the above poems are mostly verbiage, and make no sense. We compare that with the poems most recognised as being written by humans.

The Fly

William Blake (87% said “human”)

Little Fly,
Thy summer's play
My thoughtless hand
Has brushed away.

Am not I
A fly like thee?
Or art not thou
A man like me?

For I dance
And drink, and sing,
Till some blind hand
Shall brush my wing.

If thought is life
And strength and breath
And the want 
Of thought is death;

Then am I
A happy fly,
If I live,
Or if I die.

Untitled

Shelby Asquith (86% said “human)

His smile was loud,
and me in my silence—
I thought it was meant
for me. Momma warned
me about boys like him.
Told me that the kinds
of boys that shined a little 
too bright might just be 
trying to distract me from 
the balled fists, the fury.
I was a fly and he lured
me straight into the light.
And oh how he burned
me, how he burned me.

O Fool

Rabindranath Tagore (84% said “human”)

O Fool, try to carry thyself upon thy own shoulders! 
O beggar, to come beg at thy own door! 

Leave all thy burdens on his hands who can bear all, 
and never look behind in regret. 

Thy desire at once puts out the light from the lamp it touches with its breath. 
It is unholy---take not thy gifts through its unclean hands. 
Accept only what is offered by sacred love.

Especially with Blake’s poem, these top poems display some things that computers can scarcely replicate. In Blake’s poem, there is a very strong meter and rhyme scheme going on. The other two read very coherently, where the ideas contained within the lines agree with each other, and a message flows from the poem to the reader. What does it take for a piece of writing/poem to resemble human effort? We look at computer-generated poems that people thought looked human.

#6

Generated by Janus Node using Janus Node (69% said “human”)

you

  are

      inscribed
          in the
           lines on the
     ceiling

      you

 are

   inscribed in
         the depths
   of
         the
    storm

A Wounded Deer Leaps Highest

Generated by Poets using Ray Kurzweil’s Cybernetic Poet (67% said “human”)

A wounded deer leaps highest,
I've heard the daffodil
I've heard the flag to-day
I've heard the hunter tell;
'Tis but the ecstasy of death,
And then the brake is almost done,
And sunrise grows so near
sunrise grows so near
That we can touch the despair and
frenzied hope of all the ages.

some men

Generated by Every Google User using Google Predictive Search (67% said “human”)

some men just want to watch the world burn 
some men just want to watch the world learn 
some men just want breakfast

In these instances, how did algorithmically-generated sequences of words suddenly gain the verisimilitude of human effort, whereas the computer-like computer-generated attempts shown above failed? Finally, we look at poems written by humans that people thought were computer generated.

Cut Opinions

Deanna Ferguson (76% said “bot”)

cut opinions tear tasteful
hungers huge ground swell
partisan have-not thought
green opinions hidden slide
hub from sprung in
weather yah
bold erect tender
perfect term transparent till
I two minute topless formed
A necessarily sorry sloppy strands
hot opinions oh like an apple
a lie, a liar kick back
filial oh well hybrid opinions happen
not stopped

Cinema Calendar Of The Abstract Heart – 09

Tristan Tzara (69% said “bot”)

the fibres give in to your starry warmth
a lamp is called green and sees
carefully stepping into a season of fever
the wind has swept the rivers' magic
and i've perforated the nerve
by the clear frozen lake
has snapped the sabre
but the dance round terrace tables
shuts in the shock of the marble shudder
new sober

Red Faces

Gertrude Stein (69% said “bot”)

Red flags the reason for pretty flags.
And ribbons.
Ribbons of flags
And wearing material
Reason for wearing material.
Give pleasure.
Can you give me the regions.
The regions and the land.
The regions and wheels.
All wheels are perfect.
Enthusiasm.

The Experiment

That we have seen the most human-like human/computer-written poems, and most computer-like human/computer-written poems, can we draw parallels for what constitutes human effort in poetry? On the technical side, can we say that, as per Blake’s poem, prosodic and auditory cues such as stress, meter, and rhyme give poems a sense of human effort, such as where by reciting “Tiger, tiger, burning bright/In the forests of the night,” we can not only hear the rhyme but feel a sense of constant rhythm to the poem?

Surely that can only be achieved by humans? Not rightly so. Assuming a bot program has access to a dictionary, and the stress, meter, and phonetics of all words contained therein all mapped out, how hard would it be to code for something that reads like human poetry? (Following is not real code, but an idea of how the code should behave)

<write human poetry>
component: alternate stress, strong-weak
component: vowels at end of line match; SET1:AAB SET2:CCB
SET1
component structure line1: [pronoun][conjunction][pronoun]
component structure line2: [verb][preposition][location]
component structure line3: [verb][noun]
SET2
component structure line1: [pronoun][verb]
component structure line2: [conjunction][verb][noun]
component structure line3: [conjunction][pronoun][verb]
And I just described:

Jack and Jill
went up the hill
to fetch a pail of water.

Jack fell down
and broke his crown
and Jill came tumbling after.

A bot could browse through a dictionary and probably come up with something similar. Granted, I “retro-wrote” the code, where I already had a poem in mind and wrote the “code” after, but if I can break “Jack and Jill” down into an algorithm that can be reproduced, using auditory and prosodic cues, then surely it is solely not that that determines human effort in poetry? However, if a program relies solely on prosodic and auditory cues, what’s to prevent it from putting in random words that fit those cues but make no sense in sequence? For example:

Bird and ball
swirled by the mall
and cocked a round of seaweed

Truth flew out
where running lout
and cops were sniffing soft beads

The prosodic and auditory cues of the above poem match “Jack and Jill” yet it makes no sense, and it is likely that people would judge it to be written by a bot. So what else is required for poems to be recognised as human effort?

The other thing you’ll notice where human-like poetry trumps computer-like poetry is coherence of ideas. In the poems that read human, most of them have ideas that agree with either a general theme, or the lines preceding and following them. The ideas contained in each line also display a progression, where there is something being explored or developed. The computer-like poems tend to show disjointedness of ideas.

Perhaps humans are predisposed to pass continuity and coherence as hallmarks of humanity.

Is coherence then unique to humans, or can computers imitate coherence as well? Let’s see if we can imitate coherence with an algorithm as well, with added prosodic and auditory cues. To achieve that, we need thematic cues. I’m going to use the following poem, “I wandered lonely as a cloud” by William Wordsworth.

I wandered lonely as a cloud
That floats on high o’er vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.

<write coherent poem>
component: alternate stress, weak-strong
component: SET last line: [alternate stress, weak-strong]=false
component: vowels at end of line match; SET:ABABCC
component: [CENTRAL X(n)] designate IDEA
component: [CENTRAL X(n)]; X=verb, noun, adverb, preposition, adjective
component: [CENTRAL X(n)] must agree with THEME
component: [CENTRAL X(n)] must agree with [CENTRAL X(n±≥1)]
component: [CENTRAL X(n)] either expand or progress [CENTRAL X(n±≥1)]
SET
component structure line1: [pronoun][CENTRAL verb(1)][adjective][preposition][CENTRAL noun2]
component structure line2: [conjunction][CENTRAL verb3][adjective][preposition][CENTRAL noun4][conjunction][CENTRAL noun5]
component structure line3: [conjunction][adverb][pronoun][CENTRAL verb6][noun]
component structure line4: [noun][adjective][CENTRAL noun7]
component structure line5: [preposition][CENTRAL noun8][preposition][CENTRAL noun9]
component structure line6: [CENTRAL verb9][conjunction][CENTRAL verb10][preposition][CENTRAL noun11]
THEME:nature
(1) verb-noun agrees with (2); (2) agrees with THEME
(3) verb-noun agrees with (2);(4),(5) agrees with (2); (4),(5) agrees with THEME
(6) verb-noun agrees with (7)
(7) agrees with THEME
(7) preposition agrees with (8),(9); (8),(9) agrees with THEME
(10),(11) verb-noun agrees with (7); (10),(11) agrees with THEME
IDEA1:[(1),(2)]–>IDEA2:[(3),(4),(5)]
IDEA3:[(6)(7)]–>IDEA4:[(8),(9),(10),(11)]

Does the above make any sense? It took me a while to try to break down “I wandered lonely as a cloud” into a vague enough algorithm that in my opinion still represents the poem while hypothetically still able to reproduce another poem. Let me explain the above “code.”

The poem has various components, including a weak-strong stress meter; but the last line of the set breaks the meter. The last phonetic features of certain lines must match; in this case ABABCC. Within the poem, there are certain things, designated as [CENTRAL X] where X can be a verb, noun, preposition, adverb, or adjective. These [CENTRAL X] designate a contained IDEA, which is a sense of what that line means. The [CENTRAL X] must agree with a preset THEME, which in this case, is “nature”; where the words must be somehow relevant to “nature.” such as “hill,” “daffodil,” and “cloud” being all words related to “nature.” Not only must [CENTRAL X] agree with THEME, it also has to agree with each other, one or more preceding or following it. It does so not only grammatically, but also has to expand or progress it in a logical way.

IDEA1 contains [CENTRAL (1),(2)], which progresses into IDEA2, containing [CENTRAL (3),(4),(5)]. IDEA3 contains [CENTRAL (6),(7)] which progresses into IDEA4, containing [CENTRAL (8),(9),(10),(11)].

You know, even after so much postulating, I’m still not sure I have successfully “retro-coded” Wordsworth’s poem. Maybe it is coherence of idea that seem unique to human effort, and that humans are predisposed to finding order in nature. My head hurts from trying to break poems down like that. Maybe someone else can do this better than I can. Feel free to leave comments.

Answering the question, not the counterfactual

The above image made its rounds on Reddit the other day. The question asks “If you choose an answer to this question at random, what is the chance you will be correct?” The options are:

a) 25%
b) 50%
c) 60%
d) 25%

Since the randomly choosing one out of four answers is a 25% chance, so it’s a)… and d)? So since there are two correct answers, out of four choices, that is 50%, which is b). But there’s only one b), it’s 25%, so it’s a) and d)… ad nauseam.

STOP. You’re doing this wrong. Let semantics easily (and hopefully painlessly) tell you how to solve this question.

Let’s look at the question again.

“If you choose an answer to this question at random”

Let’s break it down:

IF [You] [choose 1 answer randomly] to [this] question, [percentage answer=TRUE?]

The secret is in the word, “IF”. It summons a counterfactual version of you, that you are able to discuss things in an “if” world, while not being constrained to answer by “if” rules. Thus, [counterfactual You] is supposed to pick 1 answer to [this], where [this] is self-referential to a world that has 2 correct answers out of 4. The answer is 50% for you in this world, not the world [counterfactual You] inhabits.

Hence, in your reality, not the [counterfactual You] in the question, just answer the question that they asked about counterfactual you, simple as that. An equivalent question, substituting counterfactual you with a third person, is:

Kevin has to randomly pick 1 answer out of four. However, 2 of the answers are identical and correct. What is the percentage that Kevin will pick a right answer?

Don’t sweat the counterfactuals, just stick with this reality. The right answer is B.

(No need to read the below if you don’t want technical explanations)

If you want a really convoluted discussion about semantics and counterfactuals and why we can discuss counterfactuals without being constrained by counterfactual rules, it’s simple. In counterfactual semantics we often discuss the death of Aristotle (or was it Plato?), such as “Aristotle might not have been a philosopher if he had died as a kid.” This relates to the topic of indices and what names refer to, largely researched and discussed by many linguists and philosophers, such as Kripke.

A quick answer, without going too in-depth, is that if we are bound by the indices of the counterfactuals we refer to, we will be unable to talk or respond because the counterfactuals are in an infinite loop. Thus, we can talk about Aristotle’s death without having to go back in time to kill him, or talk about what would happen at the end of the world without destroying the world to be able to talk about it. Take the following multiple self-indexed sentence.

If I were you, I would kill me

There are two people involved in the conversation, “you” and “me”, yet to our minds there seems to be a conventional understanding of what the sentence means. It means that “I am such a terrible person that if there were another person, and that person were talking to me, he would hate me so much that he would kill me.” For such a short sentence, it takes such a long sentence to elaborate. Thank goodness for indices! This is how the above sentence works with indices:

IF [counterfactual I][sees]me, [counterfactual I][wants][kill] me.

There you go.

The persistence of comprehension

chinesefup

Some time ago, Instagram user jumppingjack posted the above image of a note she left to her mum. She said that her brother secretly added extra strokes to the characters in the note. The result is interesting though: even though extra strokes were added, the note is still readable to most competent Chinese speakers. This phenomenon is very similar to one not too long ago in English, coined “Typoglycemia,” a portmanteau of “typo” and “glycemia” and a pun on “hypoglycaemia,” where as long as the first and the last letter of the word is preserved, the middle can be scrambled and the words are still understandable.

This is an interesting case in what I call persistence of comprehension, where comprehension of words persists despite efforts to thwart it.

The Preamble

Unlike English, which uses the alphabetic system where each letter is a phoneme, or Japanese, a syllabary system where each character is a mora, Chinese uses a logographic system, using “pictures,” or logographs to represent words. So unlike the other two systems where there are things to scramble, it is hard to “scramble” a picture, and scrambling a picture is no different from adding or subtracting strokes from a character, which is what jumppingjack‘s brother did.

Before I go further, let me type out what the note intends to say:

妈妈,明天的午餐因为
人数不够,他们把这个
活动换到下一个星期
(可是下个星期六中午我的
公司有个午餐)

谢谢 🙂  (我明天应该有吃午餐)

Mum, for tomorrow’s lunch because
there aren’t enough people, they have
changed this activity to next Monday.
(But next Saturday afternoon my
company has a lunch event)
Thanks 🙂  (I should be eating lunch tomorrow)

So how does persistence of comprehension occur in Chinese? I shall illustrate some of the characters that are easily understood despite the scrambling and the ones that threw me off (and my friends) the most. (Also, note that the person mis-wrote the character for 期 where he switched the 月 and 其 around, not of her brother’s doing. But the brother added an extra radical as well)

chinesemessy

The image above sorts some of the words in the note in order of persistence of comprehensibility from top to bottom, with top being easiest to understand despite scrambling and the bottom being the hardest. The scrambled word is on the left and the proper word is on the right. Note that the bottom four scrambled words are all actual Chinese words, which I will talk about shortly.

The scrambling of the 的 character is one of the easiest to understand, because despite the additional stroke, it still mostly resembles its original character, and does not resemble any other words in the language. The added stroke is a not a radical, a graphical component of a word that is often semantic, unlike the scrambling of the character 明 (tomorrow). Similarly for 因, the added stroke turns the 大 in the 因 into a 太, but on the overall the word is not a real word and mostly resembles its original.

Now we look at the addition of a stroke in 明, turning the 日 (sun) radical, usually used for weather-related words, into a 目 (eye) radical, usually used for vision-related words. The resultant scrambled word is still not a word, but the morphing of a semantically-relevant radical into another makes one pause when reading the sentence. Also, the addition of a stroke to the 月 (moon) component turns it into a 用 (use) character, making comprehension even more difficult.

One step after the 明 character is the 们 character, where not a stroke but an entire 中 (middle) word has been inserted in the middle (haha) of 们. Some of my friends disagree that it is harder than the scrambling of 明, and I’m inclined to agree, and I’d put it as a toss-up between the two. However, I feel that the insertion of an entire word as opposed to a stroke or radical morphs the word enough to the point that it becomes alien enough not to even resemble its original, but does not resemble any other word in Chinese.

Lastly, the last four words, 公,午,伞,and 下, have strokes and/or word components added to them, that they actually resemble other words in the language, 翁 (old man), 牛 (cow), 伞 (umbrella), and 卡 (card). With such resemblance to real words, little wonder people have difficult understanding the words as they read them.

The Analyis

How is it that we are able to understand the note with little difficulty?

In the English “Typoglycemia,” it has been suggested that we identify words not solely by letter position in a word, but by context, shape of the word, and position of word in the sentence. I’m going as far to suggest that in English seeing the individual letters of a scrambled word draws upon our stored memory of the word, further aiding comprehension of a scrambled word. Compare:

  1. Adcnirocg to rrasceeh at a ptaruilacr ureitnvisy
  2. Aoincdrg to rcseerh at a plaaicutr uesvtniiy
  3. Aroindg to rearech at a pluiraacr utrisveiy

Example 1 is classic “typoglycemia” where persistence of comprehension is strong, example 2 removes one non-essential letter from each word, and persistence of comprehension is still relative strong. Example 3 removes what I consider an essential component to the memory of the word, which are usually consonants and not vowels. Take this example:

  • I cnt blve u dd tt!

In English, vowels can be removed quite easily and the comprehension of the word is still possible. This suggests that consonants play a slightly more important part in the reading of words. In that aspect, comprehension of written English has some similarity to comprehension of written Arabic or Hebrew, where typically vowels are not included in the writing (in the way English does anyway). Thus, it is harder to understand “aroindg” as “according” because

  1. An essential component has been removed (3 syllables, essential components in bold: a-cc-r-dng). This might be so because consonant representations are tied up with its phonetic properties. This is why “cc” or has to be removed as opposed to just “c” from “according” for comprehension to fail, because “cc” in the word correlates to the /k/ sound in /əˈkɔː(ɹ)dɪŋ/; even with just one “c” or it is sufficient to clue us in that there might be a /k/ or sound in the scrambled word.
  2. It is the first in the sequence of essential components, suggesting that perhaps we process essential components sequentially in our head. It could be that when we see the word “according,” we could be drawing upon the idea that “according” has the components “a-cc-d-ng” in that order. Hence removing the first component “cc” impedes comprehension as it cannot give the subsequent components context of what the word might be (compare understanding: “aroindg” (“cc” removed) with “aocrcdg” (“in” removed)).

How does this relate to Chinese? If we can say that we draw upon essential sequences of components in the comprehension of written English, perhaps there is an equivalent of that in the comprehension of Chinese. I believe that in reading Chinese, there is a stored visual memory of what the character looks like in general, and also an idea of what strokes the character should contain (“legal strokes”), and what it should not (“illegal strokes”).

First, we address whether modifying a Chinese character sets off alarm bells to the reader. Adding legal strokes to scrambled characters should stand out less to the reader, causing him to accept the character as a real word visually. We look at the following example where this is demonstrated:

chinesemessy2

In the note, a floating shuzhe (vertical-bend) stroke is added to the 够 character, and in the Chinese language there is no such occurrence of a floating shuzhe; they are always attached to other strokes, such as in 喝 (with some exceptions, like 断, which may or may not be attached). Being visually alerted that there is something wrong with the character, we immediately visually discount the scrambled 够, and are able to extract the original word. In the example of 他, the pie (leftward-slant) stroke is added on top of the 亻radical, creating a 彳(step) radical, which exists. Thus when reading the scrambled word of 他, it does not jump out at the reader visually as the shuzhe stroke in 够 does, and we are likely to gloss over it and accept it as it appears to us and are less likely to question whether the character is out of place contextually or not.

Next, adding a legal stroke to scrambled words causes more confusion when the stroke turns the original word into a semantically different word. There is extra confusion when the meaning of the new word does not fit in the context of the sentence, especially when the word has been accepted as it is, as explained in the previous paragraph. These can be seen in the following examples:

chinesemessy3

If my premises are right, in example 1, readers should be able to identify the error most easily and yet still read the sentence in its original context. In example 2, they should gloss over the wrong character, and since it still resembles very much like the original, is not a new or any word at all, persistence of comprehension should still be strong. In example 3, this is where comprehension begins to be thwarted, where 他能够卡去吃午餐 (He is able to card go eat lunch) and 他能够下去吃牛餐 (He is able to go down and eat cow meal) don’t make any sense as the scrambled words have both legal strokes and are real words, and the meaning of the scrambled words are contextually out of place in the sentence.

The Conclusion

What I have coined “the persistence of comprehension” is a seemingly little-researched area in English, much less Chinese. I offered the following reasons explaining the persistence of comprehension in English “typoglycemia,” where through the combination of context, length of word, letter position, shape of word, word position in a sentence, and (what I have demonstrated with examples) identifying the letters, which draw upon phonetic representations of the word in our head, we are able to read English.

In the more interesting case of Chinese, which is logographic, I posited that there are legal and illegal strokes which can be added to a character. Legal strokes are less likely to be noticed than illegal ones. If the scrambled word is a real actual word, the effect of having legal strokes masks the fact that the word has been scrambled, and when we read it, the sentence doesn’t make sense because we do not suspect a character has been tampered with.

All in all, more extensive research must be done, than what this blog can provide. I don’t know if I will be able to do so, but if anyone wants to hear my notes on this topic, feel free to reach out to me at ws672[at]nyu[dot]edu.

Note: In the original version of this post, I wrote that jumppingjack was male, when she is female. Corrections have been made.

How do cows moo in a British accent?

A listener to the podcast “How to Do Everything” named Rachel asked: “How would a person moo in a British accent?” The listener is from Nevada, and professes to moo with an American accent.

What better way to find out than to ask someone close to the source? The kind folks at the show invited Sir Patrick Stewart to answer Rachel’s question, which has been on her mind for a couple of months. Patrick Stewart answers:

It’s not a straightforward, simple answer. Unlike, probably, many other countries, where a cow’s moo is a cow’s moo, in England, you understand, we are dominated by class, by social status, and by location. So, for example, a cow that’s in a field  next to my house in West Oxfordshire would moo in one kind of way, and cow in the field in the semi-industrial town I grew up in in the north of England would moo in another kind of way.

Patrick Stewart then goes on to demonstrate how the different cows would sound at home by bellowing himself. He describes the moo’s of the cows near his home, a long protracted low, as “very conservative.” He goes on to explain why:

You must understand that I live in the constituency of David Cameron, our Prime Minister, who is a Tory (the Conservative Party). And I assume these cows voted for him. I don’t actually vote there, I vote in another place, in London.

If I’m at my home in Yorkshire, where I grew up, and not that there are many fields left where I grew up but I would find one and I would find some cows, what you hear would be something like this: mehhhhhhh. Well, this has all to do with environmental and cultural conditioning.

He emphasises that moos vary by location, and recommends that travellers talk to cows all over the country in any country, because “cows have a great deal to tell us.”

The hosts at the show offered a bit of a culture exchange and offered to show Patrick Stewart what a Nevada cow sounds like, but Stewart suprises them all by saying that his wife is also from Nevada, and he has experience with Nevada cattle, and did his best impression of a Nevada cow. His Nevada cow is high-pitched and nasal, because “that’s the way you people (the hosts) talk. The cows are influenced by how you talk, as you are influenced by the cows.”

The hosts went on to ask, “How a Cockney cow would moo?” Stewart replies:

You understand, Cockney cows are pretty rare these days. I mean, Shakespeare’s days, there were cattle in the middle of London. But nowadays, generally speaking, the city of London doesn’t feel too good about having cattle in Picadilly Circus for example. But I can you an idea of a Cockney cow — I’m old enough to remember when there were Cockney cows.

The resulting impersonation (incowation?) sounds like a mehh-aye! which Stewart describes as “more like a sheep than a cow.” The hosts points out that in Stewart’s walk-through of English cows so far, not only are there different cow accents, it seems that there are also different cow attitudes throughout the country. Stewarts extrapolates:

You are absolutely right. What you just heard just now was an urban cow. All of us who live in big cities, we have to be watchful, we have to be on our guard. We have to be prepared for fight-or-flight at any moment, and it is the same with cattle.

Breeding is of utmost important in humans, as it is in cattle. How would a well-bred cow sound like?

We had a Prime Minister many, many years ago called Alec Douglas-Home (pronounced hyoom) and one of the wonderful things about Alec Douglas-Home — including his name by the way, and you’d think that his name is probably spelled “H-double O-M-E” or “H-U-M-L” or something like that, his name was actually spelled “H-O-M-E” — home, but it was pronounced “hume,” and we do that mostly to confuse Americans, like Leicester Square and Lye-cester.

Anyway, the thing about Alec Douglas-Home was that he didn’t move his lips when he talked, (unintelligible mumble because Stewart is mimicking talking without his lips but the words sounds like: “and here’s an example, this is how he always talks. He didn’t actually move the lips.”) Because moving your lips is terribly bad taste. So, if Alec Douglas-Home had cattle, and I’m sure he did; he must have been a landowner because I think he was actually Scottish, his cows would mooed something like this: hrmmmmmm. Very refined, very sophisticated, very cultured. These cows had gone to Eton or Harrow (prestigious boarding schools), or at least the cow equivalent.

There you go, how British cows moo, and like humans, how they speak is also  affected by social ecownomics.

I’ll see myself out now.

Listen to the full podcast here, if not for the knowledge, then at least to hear Sir Patrick Steward moo like a cow!