The Hexacoto

Listening to the sound of one hand clapping

Tag: linguistics

On copyrights and conlangs

One would imagine that if someone invented a language for use in a creative work, that someone could retain copyrights to the use of the language, would he or she not? According to U.S. case law, that might not be so, in the opinions of the Language Creation Society (LCS), a nonprofit organisation created for the promotion and discussion of constructed languages (conlangs). LCS submitted an amicus curiae — unsolicited advice to the courts for pending cases — to the Paramount v. Axanar case, where Paramount Pictures sued Axanar Productions for infringing various parts of its intellectual property over a fan-produced film Prelude to Axanar. These include: Vulcans, likeness of Vulcans, Romulans, uniforms with gold shirts, triangular medals, etc etc., and The Hollywood Reporter has more information on how Axanar is planning to respond to each claim.

What caught my interest was that Paramount laid claim to the Klingon language, which got me thinking: “Can conlangs created for the purpose of creative works be copyrighted?” Paramount has claimed the copyright to the Klingon language, its vocabulary and its graphemes but LCS believes that its claims will not hold in the court of law — indeed, anyone can claim copyright to anything but it is only when copyrights are challenged in courts that they are found to be valid or otherwise.

Citing an 1879 case Baker v. Selden, LCS said that while reproductions of creative works are protected by copyright laws, no laws exist to prevent individuals from using systems and creating their own derivative works with it. Baker v. Selden was a case where Selden created a system for bookkeeping and wrote books about it, hoping to sell the ideas to counties and the government. However, he was unsuccessful. Later on, Baker produced a book on bookkeeping that had systems very similar to Selden’s. Selden’s estate sued Baker, but the courts ruled that:

“[W]hilst no one has a right to print or publish his book, or any material part thereof, as a book intended to convey instruction in the art, any person may practice and use the art itself which he has described and illustrated therein… The copyright of a book on book-keeping cannot secure the exclusive right to make, sell, and use account books prepared upon the plan set forth in such a book.”

Thus Klingon as a language cannot be copyrighted, especially given its status where not only its creator Marc Okrand but also various scholars have contributed to the development and expansion of Klingon. Upon the establishment of linguistic rules of Klingon — syntax, phonology, morphology, etc — the language takes a life of its own and it would be unthinkable that a copyright holder could lay ownership to all and any subsequent derivative works from using that language. LCS cites the translated works of Shakespeare’s Hamlet and the Epic of Gilgamesh into Klingon, not to mention the technically somewhat-valid native speaker of Klingon, Alec d’Armond Speers, as proof that Klingon is a robust-enough language and trying to enforce copyrights to it would be unreasonable. It would be akin to enforcing a copyright on Esperanto today.

Moreover, the movie Prelude to Axanar isn’t ripping Klingon dialogue verbatim in their fan film, but devising dialogue based on using the Klingon language system. Indeed, besides Baker v. Selden, LCS is citing many other case laws to their effect that Klingon is a free language that cannot be copyrighted, and if anyone’s interested should check out the amicus brief (linked at the top of this post and also embedded below). The brief is also replete with lots of Klingon phrases written in Klingon script and accompanying footnotes, which I think is hilarious but probably not very amusing to the presiding judge.

So what does this mean? Feel free to write your fan-fiction in Quenya, Sindarin, Klingon, Na’vi or even that Atlantean language that Marc Okrand also created for the Disney movie Atlantis: The Lost Empire. As long as one isn’t merely ripping off dialogue (reproduction) but can prove that the utilised language is a product of using the language systems to derive original work, copyrights to the language are unenforceable. Granted, this is merely my opinion, but other folks have voiced similar sentiments (I liked this and this from LCS member Sai), and they’re worth checking out if you’re interested in copyrights and conlangs. We’ll see May 9th how the courts rule, and hopefully it’ll be in the favour of language hobbyists and enthusiasts.

Advertisement

Notes on how to pronounce Malay words in ‘Semoga Bahagia’

Singaporean musical group The TENG Ensemble did a cover of one of my favourite childhood songs, “Semoga Bahagia,” which translates loosely from Malay into “wishing you happiness” or according to Wikipedia, “May you achieve happiness.” Their use of traditional Chinese instruments for the song makes for a wonderful arrangement, led by local indie singer Inch Chua, who sounds great.

However, and some have noticed and commented on the ensemble’s Facebook post about the video, Chua does not quite get the pronunciations of the words right. Chua’s pronunciation is a highly Anglicised/Americanised vowel/consonant map, informed by her Chinese background. If the TENG Ensemble ever wishes to redo their video — and I think they’ve indicated their interest in doing so — here’s my notes on how to get the Malay words right. An understanding of IPA transcription will help in understanding this post better but even if not, I’ll try to transcribe it in an easily understandable format.

Note 1: The Malay language uses flapped/tapped r’s (/r/), which is similar to the r’s in Spanish, Japanese, and many other languages in the world. It does not use the rhotic r (/ɹ/) English uses.

Note 2: Vowels in spoken Malay tend to be preserved in their lengths and rarely shortened unless spoken very fast. Therefore words like “jiwa” should sound like “jee-wah” (/dʒiwa/) rather than “juh-wah” (/dʒɪwa/). When singing, it’s especially important to preserve the vowels since they become very apparent when shortened.

Note 3: Malay does not usually do aspirated consonants. There is strong aspiration in Chua’s d’s in “pemudi-pemuda,” which is how we usually pronounce d’s in English. Thus the Malay “d” sounds different from the way one would pronounce “dog” in English, which has an audible breathy release in the initial consonant. (Contrast the d consonant in “dog” vs. “dandan 淡淡”)

Note 4: In Malay orthography, “ng” is the velar nasal (/ŋ/), even if it’s between two vowel. Thus “dengan” is “duh-ng-an” and not “deng-gan.” (/dəŋan/) Same with “c” it’s a postalveolar affricate (/tʃ/) as in “ch-urch” and never an “s” sound.

And now for the second-by-second analysis! My comments will be in the form of (observation); followed by suggestion if applicable.

  • Pandai cari [0:35] — rolled r; do flapped r instead.
  • pelajaran [0:39] — rolled r; do flapped r instead.
  • jaga diri [0:46] — rolled r; do flapped r instead.
  • kesihatan [0:49] — Chua said kAH-see-ha-tan; kuh-see-haa-tan, preserve e vowel, don’t aspirate t.
  • Serta sopan santun [0:53] — mispronounced, rolled r; SER-TA sopan santun.
  • dengan [0:58] — good job on de-NG-an!
  • bersih serta suci [1:23] — rolled r; do flapped r instead, preserve all r’s.
  • hormat dan berbudi [1:27] — rolled r; do flapped r instead.
  • jaga tingkah [1:29] — k consonant dropped; preserve the k sound in tingkah, but lightly released.
  • Capailah [1:38] — diphtong aɪ changed to vowel a; don’t drop the i in ca-pAI-lah.
  • pemudi-pemuda [1:42] — aspirated d’s; don’t aspirate d, especially audible in pemuDA.
  • kita ada harga [1:48] — rolled r; do flapped r instead.
  • di mata dunia [1:50] — good job on the d! This is the example of the unaspirated d.
  • kalau kita [2:10] — a vowel changed to e (/a/ to /ə/); preserve A vowel, kAH-lau instead of kUH-lau.
  • lengah [2:12] — added a g consonant; there is no g consonant, it’s pronounced le-ng-ah, not len-gah.
  • serta [2:13] — rolled r; do flapped r instead.
  • hidup [2:15] — Chua said he-daap (/hidap/); it’s pronounced he-doop (/hidup/)
  • sia-sia [2:16] — Chua said saya-saya (/saɪya-saɪya/); it’s see-ah see-ah (/sia-sia/)
  • jiwa [2:19] — Chua said juh-wah (/jəwa/); preserve all vowels, it’s pronounced jee-wah
  • besar sihat serta segar [2:19] — rolled r; do flapped r instead.
  • dengan (2:23) — good example of dengan.
  • perangai pemudi [2:28] — rolled r, aspirated d; do flapped r instead, don’t aspirate d.
  • cergas [2:31] – Chua said sergas, rolled r; it’s pronounced CHeRgas, ch consonant, do flapped r instead
  • suka rela [2:35] — rolled r; do flapped r instead.
  • berbakti [2:37] — rolled r; do flapped r instead. This word is also an example of unreleased ‘k’
  • sikap yang pembela [2:38] — vowel was changed to p-uhm-bUH-la (/pəmbəla/); preserve vowel, it’s p-uhm-bAY-la (/pəmbela/).
  • berjasa [2:41] — rolled r; do flapped r instead.
  • capailah [2:44] — see before
  • pemudi-pemuda [2:47] — see before
  • rajinlah supaya berjasa [2:52] — rolled r; do flapped r instead.
  • semoga bahagia [2:56] — bahagia was pronounced as bUH-ha-g-i-a (/bəhagia/); preserve the A vowel in “ba,” making it bAH-ha-gi-a (//bahagia/). It’s ok to break “gia” into “gi-a” for stylistic purposes but “ba” should remain “ba.” In Malay, “be” (/bə/) and “ba” (/ba/) are contrastive.

Hey, no one ever said Malay was easy, right?

Here’s an example of the song sung by a Malay person (I’m assuming) with all the right consonant sounds, although I think it’s interesting when he overdoes some of his r’s and turns it into a trill [0:33].

It’s a lovely touch that gives it a very folksy flavour, that I’ve heard sometimes older Malay Singaporeans do than younger ones. It’s not quite dissimilar in Japanese, where the flapped r can become a trilled r, called makijita (巻き舌 or rolled tongue), and it is sometimes associated with rural communities, and — interestingly — with the Yakuza and in war cries (“uorrrrrrryaa!”). Pay attention to how he pronounces his d’s, r’s, t’s, and k’s.

Hopefully this will help the TENG Ensemble and Inch Chua make a better second version of the song hitting all the right sounds of the wonderful Bahasa Melayu. After all, nobody wants to hear an ang-mor-cised version of our Chinese songs, do we?

 

(Woe…. da…. zheeeah.. gay woe! Eee shwaang zheeaan ding zhhh paaang….!!!)

So is Zuckerberg fluent in Chinese or not?

Image: Tsinghua University

Image: Tsinghua University

Mark Zuckerberg recently made news because of a dialogue he gave at Tsinghua University in Beijing, China, where he spoke (Mandarin) Chinese. Here’s the link to the full video, 30 minutes long.

There have been news coverage about Zuckerberg’s Chinese language skills, from laudatory “Of course he speaks fluent Mandarin” headlines to less-than-favourable comparisons to that of a seven-year old. It turns out that there is a problem with the mixed coverage of Zuckerberg’s student dialogue — the news and the public often get confused what language “fluency” is.

Zuckerberg’s spoken Chinese, while mostly coherent, and with some interruptions, mangled most of the tones that the Chinese language is widely known for. BBC’s coverage pointed out that Zuckerberg’s failure to properly produce tones led him to claim that Facebook had just “11 mobile users” instead of “one billion.”

To understand how “fluent” one is when talking about mastery of a foreign language, one needs to understand the difference between “competence,” “performance,” “fluency,” and “proficiency” — four terms often used in such discussions. I’ll attempt to explain the differences between these four concepts.

Language competence and performance are the two biggest things that people are actually talking about when discussing a person’s L2 (foreign-language) “fluency.” Simply put, competence is a person’s grasp on the language’s grammar, phonological/phonetic rules, etc. Thus a person can be completely competent in a language but unable to perform as well. An example would be a person completely able to understand and speak Spanish, but unable to roll one’s R’s.

The distinction between competence and performance was notoriously made by Noam Chomsky in 1965. However, Chomsky’s notion of grammatical and phonological (linguistic) competence was expanded by Dell Hymes in 1966.  It included the knowing of appropriateness of topics and politeness (sociolinguistic), understanding how to combine language structures into different oral and written types (discourse), and knowing how to repair communication breakdown in the presence of interference (strategic). These concepts came to be known as “communicative competence” in literature.

Zuckerberg’s Chinese performance with his tones was bad, but he understood most of the questions asked. He was also able to answer simple to moderately complex questions in grammatically-sound Chinese. His use of humour and appropriate politeness further signals competence. It can be said that he’s mostly competent in Chinese.

Fluency is the measure of the ease of production of the language. It can be measured by speed, sustainment, and/or lack of breaks. Fluency includes not only speech fluency, but reading, writing, and even listening. Thus a person could be illiterate and have a limited vocabulary, but can be considered fluent if the speech occurs at a smooth pace. Zuckerberg’s speech does contain many pauses in parts of his speech, to the point that it might be hard for the listener to understand what he’s trying to say. He may not be very fluent, but through context of his speech, listeners (especially native speakers) are able to repair the content of his speech mostly.

Lastly, proficiency is the mastery of how well one uses the language, and can usually be tested by means such as the TOEFL or JLPT. One thing that was pointed out to me is that in language testing, proficiency is usually norm-referenced; that means that test takers are tested on how well they did in comparison to other test takers. This is different from criterion-referenced tests, such as the GCEs and GCSEs, where test takers are measured if they meet a set of pre-defined criteria. Once again, a speaker can be fluent in the language, but not necessarily proficient.

Bot or Not? Poetry does not compute

compupoem

Passing Time
Your skin like dawn
Mine like musk

One paints the beginning
of a certain end
The other, the end of a
sure beginning

Do you think the above poem is written by a human, or generated by a computer program? That’s what site bot or not seeks to achieve: a Turing test where people try to discriminate computer-generated poems from human-written ones. By the way, the above poem, “Passing Time,” was written by a human Maya Angelou.

I have previously written about reconciling the idea of programming code as poetry, whether it’s possible to achieve “poetic beauty” in code. I posited that there might possibly be an inherent “poetic beauty” in poetry that we recognise when we decide a piece of writing is a poem. With bot or not, the idea of what makes poems poetry is taken in a different step: by identifying whether the “poems” are human or computer-generated, there must be something common with human effort that is visibly distinguishable.

That means that any piece of writing that is seen as containing enough of the essence of “poetic beauty” to be counted as a poem, can be further sub-divided into being human effort or algorithm-derived. There must be commonality between “computer poems” and “human poems” that we can decisively say, “Yes, this is the work of a human” or “This is clearly computer-generated.”

The Showcase

The site lists some of the top poems easily recognised as human or computer-written.

Poem

Generated by botpoet using JGnoetry (93% said “bot”)

Published on desserts and from pink. Symptoms
Start, 2013 as other poetry does anyone
Word in thailand and write reading one mother
Order they deserve. Well, 2013 recently released
My pants and serve throw from is a beautiful
Insane surreal once hours playing once.

Personal Space

Generated by botpoet using JGnoetry (87% said “bot”)

New poetry I might help you currently
Have been, snapping male, but it’s a rocket,
Kid man been forever idea of
My wearing a punk and brought old robot
Smog thing. Professional grown-up looking
In dusty old men remaining his tibia.

Bright

Generated by Jim Carpenter using Erica T Carter (85% said “bot”)

The name ghosts second, destroying.
The quite normal letter to the dutch throne after one year destroying the still stuck ridged snow, interpreted.
Undimmed radiance curves, primming like the column.
Lounging, as other as very high score.
Loafing a booby.
Lounging and spends.
Normal occasion joints.
Early letter gets the personal experience from the individual.
Getting, tarries however in cell.
Lounging.
Obsesses the disturbed surface.
Obsesses the abyss.

As can be seen, the above poems are mostly verbiage, and make no sense. We compare that with the poems most recognised as being written by humans.

The Fly

William Blake (87% said “human”)

Little Fly,
Thy summer's play
My thoughtless hand
Has brushed away.

Am not I
A fly like thee?
Or art not thou
A man like me?

For I dance
And drink, and sing,
Till some blind hand
Shall brush my wing.

If thought is life
And strength and breath
And the want 
Of thought is death;

Then am I
A happy fly,
If I live,
Or if I die.

Untitled

Shelby Asquith (86% said “human)

His smile was loud,
and me in my silence—
I thought it was meant
for me. Momma warned
me about boys like him.
Told me that the kinds
of boys that shined a little 
too bright might just be 
trying to distract me from 
the balled fists, the fury.
I was a fly and he lured
me straight into the light.
And oh how he burned
me, how he burned me.

O Fool

Rabindranath Tagore (84% said “human”)

O Fool, try to carry thyself upon thy own shoulders! 
O beggar, to come beg at thy own door! 

Leave all thy burdens on his hands who can bear all, 
and never look behind in regret. 

Thy desire at once puts out the light from the lamp it touches with its breath. 
It is unholy---take not thy gifts through its unclean hands. 
Accept only what is offered by sacred love.

Especially with Blake’s poem, these top poems display some things that computers can scarcely replicate. In Blake’s poem, there is a very strong meter and rhyme scheme going on. The other two read very coherently, where the ideas contained within the lines agree with each other, and a message flows from the poem to the reader. What does it take for a piece of writing/poem to resemble human effort? We look at computer-generated poems that people thought looked human.

#6

Generated by Janus Node using Janus Node (69% said “human”)

you

  are

      inscribed
          in the
           lines on the
     ceiling

      you

 are

   inscribed in
         the depths
   of
         the
    storm

A Wounded Deer Leaps Highest

Generated by Poets using Ray Kurzweil’s Cybernetic Poet (67% said “human”)

A wounded deer leaps highest,
I've heard the daffodil
I've heard the flag to-day
I've heard the hunter tell;
'Tis but the ecstasy of death,
And then the brake is almost done,
And sunrise grows so near
sunrise grows so near
That we can touch the despair and
frenzied hope of all the ages.

some men

Generated by Every Google User using Google Predictive Search (67% said “human”)

some men just want to watch the world burn 
some men just want to watch the world learn 
some men just want breakfast

In these instances, how did algorithmically-generated sequences of words suddenly gain the verisimilitude of human effort, whereas the computer-like computer-generated attempts shown above failed? Finally, we look at poems written by humans that people thought were computer generated.

Cut Opinions

Deanna Ferguson (76% said “bot”)

cut opinions tear tasteful
hungers huge ground swell
partisan have-not thought
green opinions hidden slide
hub from sprung in
weather yah
bold erect tender
perfect term transparent till
I two minute topless formed
A necessarily sorry sloppy strands
hot opinions oh like an apple
a lie, a liar kick back
filial oh well hybrid opinions happen
not stopped

Cinema Calendar Of The Abstract Heart – 09

Tristan Tzara (69% said “bot”)

the fibres give in to your starry warmth
a lamp is called green and sees
carefully stepping into a season of fever
the wind has swept the rivers' magic
and i've perforated the nerve
by the clear frozen lake
has snapped the sabre
but the dance round terrace tables
shuts in the shock of the marble shudder
new sober

Red Faces

Gertrude Stein (69% said “bot”)

Red flags the reason for pretty flags.
And ribbons.
Ribbons of flags
And wearing material
Reason for wearing material.
Give pleasure.
Can you give me the regions.
The regions and the land.
The regions and wheels.
All wheels are perfect.
Enthusiasm.

The Experiment

That we have seen the most human-like human/computer-written poems, and most computer-like human/computer-written poems, can we draw parallels for what constitutes human effort in poetry? On the technical side, can we say that, as per Blake’s poem, prosodic and auditory cues such as stress, meter, and rhyme give poems a sense of human effort, such as where by reciting “Tiger, tiger, burning bright/In the forests of the night,” we can not only hear the rhyme but feel a sense of constant rhythm to the poem?

Surely that can only be achieved by humans? Not rightly so. Assuming a bot program has access to a dictionary, and the stress, meter, and phonetics of all words contained therein all mapped out, how hard would it be to code for something that reads like human poetry? (Following is not real code, but an idea of how the code should behave)

<write human poetry>
component: alternate stress, strong-weak
component: vowels at end of line match; SET1:AAB SET2:CCB
SET1
component structure line1: [pronoun][conjunction][pronoun]
component structure line2: [verb][preposition][location]
component structure line3: [verb][noun]
SET2
component structure line1: [pronoun][verb]
component structure line2: [conjunction][verb][noun]
component structure line3: [conjunction][pronoun][verb]
And I just described:

Jack and Jill
went up the hill
to fetch a pail of water.

Jack fell down
and broke his crown
and Jill came tumbling after.

A bot could browse through a dictionary and probably come up with something similar. Granted, I “retro-wrote” the code, where I already had a poem in mind and wrote the “code” after, but if I can break “Jack and Jill” down into an algorithm that can be reproduced, using auditory and prosodic cues, then surely it is solely not that that determines human effort in poetry? However, if a program relies solely on prosodic and auditory cues, what’s to prevent it from putting in random words that fit those cues but make no sense in sequence? For example:

Bird and ball
swirled by the mall
and cocked a round of seaweed

Truth flew out
where running lout
and cops were sniffing soft beads

The prosodic and auditory cues of the above poem match “Jack and Jill” yet it makes no sense, and it is likely that people would judge it to be written by a bot. So what else is required for poems to be recognised as human effort?

The other thing you’ll notice where human-like poetry trumps computer-like poetry is coherence of ideas. In the poems that read human, most of them have ideas that agree with either a general theme, or the lines preceding and following them. The ideas contained in each line also display a progression, where there is something being explored or developed. The computer-like poems tend to show disjointedness of ideas.

Perhaps humans are predisposed to pass continuity and coherence as hallmarks of humanity.

Is coherence then unique to humans, or can computers imitate coherence as well? Let’s see if we can imitate coherence with an algorithm as well, with added prosodic and auditory cues. To achieve that, we need thematic cues. I’m going to use the following poem, “I wandered lonely as a cloud” by William Wordsworth.

I wandered lonely as a cloud
That floats on high o’er vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.

<write coherent poem>
component: alternate stress, weak-strong
component: SET last line: [alternate stress, weak-strong]=false
component: vowels at end of line match; SET:ABABCC
component: [CENTRAL X(n)] designate IDEA
component: [CENTRAL X(n)]; X=verb, noun, adverb, preposition, adjective
component: [CENTRAL X(n)] must agree with THEME
component: [CENTRAL X(n)] must agree with [CENTRAL X(n±≥1)]
component: [CENTRAL X(n)] either expand or progress [CENTRAL X(n±≥1)]
SET
component structure line1: [pronoun][CENTRAL verb(1)][adjective][preposition][CENTRAL noun2]
component structure line2: [conjunction][CENTRAL verb3][adjective][preposition][CENTRAL noun4][conjunction][CENTRAL noun5]
component structure line3: [conjunction][adverb][pronoun][CENTRAL verb6][noun]
component structure line4: [noun][adjective][CENTRAL noun7]
component structure line5: [preposition][CENTRAL noun8][preposition][CENTRAL noun9]
component structure line6: [CENTRAL verb9][conjunction][CENTRAL verb10][preposition][CENTRAL noun11]
THEME:nature
(1) verb-noun agrees with (2); (2) agrees with THEME
(3) verb-noun agrees with (2);(4),(5) agrees with (2); (4),(5) agrees with THEME
(6) verb-noun agrees with (7)
(7) agrees with THEME
(7) preposition agrees with (8),(9); (8),(9) agrees with THEME
(10),(11) verb-noun agrees with (7); (10),(11) agrees with THEME
IDEA1:[(1),(2)]–>IDEA2:[(3),(4),(5)]
IDEA3:[(6)(7)]–>IDEA4:[(8),(9),(10),(11)]

Does the above make any sense? It took me a while to try to break down “I wandered lonely as a cloud” into a vague enough algorithm that in my opinion still represents the poem while hypothetically still able to reproduce another poem. Let me explain the above “code.”

The poem has various components, including a weak-strong stress meter; but the last line of the set breaks the meter. The last phonetic features of certain lines must match; in this case ABABCC. Within the poem, there are certain things, designated as [CENTRAL X] where X can be a verb, noun, preposition, adverb, or adjective. These [CENTRAL X] designate a contained IDEA, which is a sense of what that line means. The [CENTRAL X] must agree with a preset THEME, which in this case, is “nature”; where the words must be somehow relevant to “nature.” such as “hill,” “daffodil,” and “cloud” being all words related to “nature.” Not only must [CENTRAL X] agree with THEME, it also has to agree with each other, one or more preceding or following it. It does so not only grammatically, but also has to expand or progress it in a logical way.

IDEA1 contains [CENTRAL (1),(2)], which progresses into IDEA2, containing [CENTRAL (3),(4),(5)]. IDEA3 contains [CENTRAL (6),(7)] which progresses into IDEA4, containing [CENTRAL (8),(9),(10),(11)].

You know, even after so much postulating, I’m still not sure I have successfully “retro-coded” Wordsworth’s poem. Maybe it is coherence of idea that seem unique to human effort, and that humans are predisposed to finding order in nature. My head hurts from trying to break poems down like that. Maybe someone else can do this better than I can. Feel free to leave comments.

Answering the question, not the counterfactual

The above image made its rounds on Reddit the other day. The question asks “If you choose an answer to this question at random, what is the chance you will be correct?” The options are:

a) 25%
b) 50%
c) 60%
d) 25%

Since the randomly choosing one out of four answers is a 25% chance, so it’s a)… and d)? So since there are two correct answers, out of four choices, that is 50%, which is b). But there’s only one b), it’s 25%, so it’s a) and d)… ad nauseam.

STOP. You’re doing this wrong. Let semantics easily (and hopefully painlessly) tell you how to solve this question.

Let’s look at the question again.

“If you choose an answer to this question at random”

Let’s break it down:

IF [You] [choose 1 answer randomly] to [this] question, [percentage answer=TRUE?]

The secret is in the word, “IF”. It summons a counterfactual version of you, that you are able to discuss things in an “if” world, while not being constrained to answer by “if” rules. Thus, [counterfactual You] is supposed to pick 1 answer to [this], where [this] is self-referential to a world that has 2 correct answers out of 4. The answer is 50% for you in this world, not the world [counterfactual You] inhabits.

Hence, in your reality, not the [counterfactual You] in the question, just answer the question that they asked about counterfactual you, simple as that. An equivalent question, substituting counterfactual you with a third person, is:

Kevin has to randomly pick 1 answer out of four. However, 2 of the answers are identical and correct. What is the percentage that Kevin will pick a right answer?

Don’t sweat the counterfactuals, just stick with this reality. The right answer is B.

(No need to read the below if you don’t want technical explanations)

If you want a really convoluted discussion about semantics and counterfactuals and why we can discuss counterfactuals without being constrained by counterfactual rules, it’s simple. In counterfactual semantics we often discuss the death of Aristotle (or was it Plato?), such as “Aristotle might not have been a philosopher if he had died as a kid.” This relates to the topic of indices and what names refer to, largely researched and discussed by many linguists and philosophers, such as Kripke.

A quick answer, without going too in-depth, is that if we are bound by the indices of the counterfactuals we refer to, we will be unable to talk or respond because the counterfactuals are in an infinite loop. Thus, we can talk about Aristotle’s death without having to go back in time to kill him, or talk about what would happen at the end of the world without destroying the world to be able to talk about it. Take the following multiple self-indexed sentence.

If I were you, I would kill me

There are two people involved in the conversation, “you” and “me”, yet to our minds there seems to be a conventional understanding of what the sentence means. It means that “I am such a terrible person that if there were another person, and that person were talking to me, he would hate me so much that he would kill me.” For such a short sentence, it takes such a long sentence to elaborate. Thank goodness for indices! This is how the above sentence works with indices:

IF [counterfactual I][sees]me, [counterfactual I][wants][kill] me.

There you go.

The persistence of comprehension

chinesefup

Some time ago, Instagram user jumppingjack posted the above image of a note she left to her mum. She said that her brother secretly added extra strokes to the characters in the note. The result is interesting though: even though extra strokes were added, the note is still readable to most competent Chinese speakers. This phenomenon is very similar to one not too long ago in English, coined “Typoglycemia,” a portmanteau of “typo” and “glycemia” and a pun on “hypoglycaemia,” where as long as the first and the last letter of the word is preserved, the middle can be scrambled and the words are still understandable.

This is an interesting case in what I call persistence of comprehension, where comprehension of words persists despite efforts to thwart it.

The Preamble

Unlike English, which uses the alphabetic system where each letter is a phoneme, or Japanese, a syllabary system where each character is a mora, Chinese uses a logographic system, using “pictures,” or logographs to represent words. So unlike the other two systems where there are things to scramble, it is hard to “scramble” a picture, and scrambling a picture is no different from adding or subtracting strokes from a character, which is what jumppingjack‘s brother did.

Before I go further, let me type out what the note intends to say:

妈妈,明天的午餐因为
人数不够,他们把这个
活动换到下一个星期
(可是下个星期六中午我的
公司有个午餐)

谢谢 🙂  (我明天应该有吃午餐)

Mum, for tomorrow’s lunch because
there aren’t enough people, they have
changed this activity to next Monday.
(But next Saturday afternoon my
company has a lunch event)
Thanks 🙂  (I should be eating lunch tomorrow)

So how does persistence of comprehension occur in Chinese? I shall illustrate some of the characters that are easily understood despite the scrambling and the ones that threw me off (and my friends) the most. (Also, note that the person mis-wrote the character for 期 where he switched the 月 and 其 around, not of her brother’s doing. But the brother added an extra radical as well)

chinesemessy

The image above sorts some of the words in the note in order of persistence of comprehensibility from top to bottom, with top being easiest to understand despite scrambling and the bottom being the hardest. The scrambled word is on the left and the proper word is on the right. Note that the bottom four scrambled words are all actual Chinese words, which I will talk about shortly.

The scrambling of the 的 character is one of the easiest to understand, because despite the additional stroke, it still mostly resembles its original character, and does not resemble any other words in the language. The added stroke is a not a radical, a graphical component of a word that is often semantic, unlike the scrambling of the character 明 (tomorrow). Similarly for 因, the added stroke turns the 大 in the 因 into a 太, but on the overall the word is not a real word and mostly resembles its original.

Now we look at the addition of a stroke in 明, turning the 日 (sun) radical, usually used for weather-related words, into a 目 (eye) radical, usually used for vision-related words. The resultant scrambled word is still not a word, but the morphing of a semantically-relevant radical into another makes one pause when reading the sentence. Also, the addition of a stroke to the 月 (moon) component turns it into a 用 (use) character, making comprehension even more difficult.

One step after the 明 character is the 们 character, where not a stroke but an entire 中 (middle) word has been inserted in the middle (haha) of 们. Some of my friends disagree that it is harder than the scrambling of 明, and I’m inclined to agree, and I’d put it as a toss-up between the two. However, I feel that the insertion of an entire word as opposed to a stroke or radical morphs the word enough to the point that it becomes alien enough not to even resemble its original, but does not resemble any other word in Chinese.

Lastly, the last four words, 公,午,伞,and 下, have strokes and/or word components added to them, that they actually resemble other words in the language, 翁 (old man), 牛 (cow), 伞 (umbrella), and 卡 (card). With such resemblance to real words, little wonder people have difficult understanding the words as they read them.

The Analyis

How is it that we are able to understand the note with little difficulty?

In the English “Typoglycemia,” it has been suggested that we identify words not solely by letter position in a word, but by context, shape of the word, and position of word in the sentence. I’m going as far to suggest that in English seeing the individual letters of a scrambled word draws upon our stored memory of the word, further aiding comprehension of a scrambled word. Compare:

  1. Adcnirocg to rrasceeh at a ptaruilacr ureitnvisy
  2. Aoincdrg to rcseerh at a plaaicutr uesvtniiy
  3. Aroindg to rearech at a pluiraacr utrisveiy

Example 1 is classic “typoglycemia” where persistence of comprehension is strong, example 2 removes one non-essential letter from each word, and persistence of comprehension is still relative strong. Example 3 removes what I consider an essential component to the memory of the word, which are usually consonants and not vowels. Take this example:

  • I cnt blve u dd tt!

In English, vowels can be removed quite easily and the comprehension of the word is still possible. This suggests that consonants play a slightly more important part in the reading of words. In that aspect, comprehension of written English has some similarity to comprehension of written Arabic or Hebrew, where typically vowels are not included in the writing (in the way English does anyway). Thus, it is harder to understand “aroindg” as “according” because

  1. An essential component has been removed (3 syllables, essential components in bold: a-cc-r-dng). This might be so because consonant representations are tied up with its phonetic properties. This is why “cc” or has to be removed as opposed to just “c” from “according” for comprehension to fail, because “cc” in the word correlates to the /k/ sound in /əˈkɔː(ɹ)dɪŋ/; even with just one “c” or it is sufficient to clue us in that there might be a /k/ or sound in the scrambled word.
  2. It is the first in the sequence of essential components, suggesting that perhaps we process essential components sequentially in our head. It could be that when we see the word “according,” we could be drawing upon the idea that “according” has the components “a-cc-d-ng” in that order. Hence removing the first component “cc” impedes comprehension as it cannot give the subsequent components context of what the word might be (compare understanding: “aroindg” (“cc” removed) with “aocrcdg” (“in” removed)).

How does this relate to Chinese? If we can say that we draw upon essential sequences of components in the comprehension of written English, perhaps there is an equivalent of that in the comprehension of Chinese. I believe that in reading Chinese, there is a stored visual memory of what the character looks like in general, and also an idea of what strokes the character should contain (“legal strokes”), and what it should not (“illegal strokes”).

First, we address whether modifying a Chinese character sets off alarm bells to the reader. Adding legal strokes to scrambled characters should stand out less to the reader, causing him to accept the character as a real word visually. We look at the following example where this is demonstrated:

chinesemessy2

In the note, a floating shuzhe (vertical-bend) stroke is added to the 够 character, and in the Chinese language there is no such occurrence of a floating shuzhe; they are always attached to other strokes, such as in 喝 (with some exceptions, like 断, which may or may not be attached). Being visually alerted that there is something wrong with the character, we immediately visually discount the scrambled 够, and are able to extract the original word. In the example of 他, the pie (leftward-slant) stroke is added on top of the 亻radical, creating a 彳(step) radical, which exists. Thus when reading the scrambled word of 他, it does not jump out at the reader visually as the shuzhe stroke in 够 does, and we are likely to gloss over it and accept it as it appears to us and are less likely to question whether the character is out of place contextually or not.

Next, adding a legal stroke to scrambled words causes more confusion when the stroke turns the original word into a semantically different word. There is extra confusion when the meaning of the new word does not fit in the context of the sentence, especially when the word has been accepted as it is, as explained in the previous paragraph. These can be seen in the following examples:

chinesemessy3

If my premises are right, in example 1, readers should be able to identify the error most easily and yet still read the sentence in its original context. In example 2, they should gloss over the wrong character, and since it still resembles very much like the original, is not a new or any word at all, persistence of comprehension should still be strong. In example 3, this is where comprehension begins to be thwarted, where 他能够卡去吃午餐 (He is able to card go eat lunch) and 他能够下去吃牛餐 (He is able to go down and eat cow meal) don’t make any sense as the scrambled words have both legal strokes and are real words, and the meaning of the scrambled words are contextually out of place in the sentence.

The Conclusion

What I have coined “the persistence of comprehension” is a seemingly little-researched area in English, much less Chinese. I offered the following reasons explaining the persistence of comprehension in English “typoglycemia,” where through the combination of context, length of word, letter position, shape of word, word position in a sentence, and (what I have demonstrated with examples) identifying the letters, which draw upon phonetic representations of the word in our head, we are able to read English.

In the more interesting case of Chinese, which is logographic, I posited that there are legal and illegal strokes which can be added to a character. Legal strokes are less likely to be noticed than illegal ones. If the scrambled word is a real actual word, the effect of having legal strokes masks the fact that the word has been scrambled, and when we read it, the sentence doesn’t make sense because we do not suspect a character has been tampered with.

All in all, more extensive research must be done, than what this blog can provide. I don’t know if I will be able to do so, but if anyone wants to hear my notes on this topic, feel free to reach out to me at ws672[at]nyu[dot]edu.

Note: In the original version of this post, I wrote that jumppingjack was male, when she is female. Corrections have been made.

How do cows moo in a British accent?

A listener to the podcast “How to Do Everything” named Rachel asked: “How would a person moo in a British accent?” The listener is from Nevada, and professes to moo with an American accent.

What better way to find out than to ask someone close to the source? The kind folks at the show invited Sir Patrick Stewart to answer Rachel’s question, which has been on her mind for a couple of months. Patrick Stewart answers:

It’s not a straightforward, simple answer. Unlike, probably, many other countries, where a cow’s moo is a cow’s moo, in England, you understand, we are dominated by class, by social status, and by location. So, for example, a cow that’s in a field  next to my house in West Oxfordshire would moo in one kind of way, and cow in the field in the semi-industrial town I grew up in in the north of England would moo in another kind of way.

Patrick Stewart then goes on to demonstrate how the different cows would sound at home by bellowing himself. He describes the moo’s of the cows near his home, a long protracted low, as “very conservative.” He goes on to explain why:

You must understand that I live in the constituency of David Cameron, our Prime Minister, who is a Tory (the Conservative Party). And I assume these cows voted for him. I don’t actually vote there, I vote in another place, in London.

If I’m at my home in Yorkshire, where I grew up, and not that there are many fields left where I grew up but I would find one and I would find some cows, what you hear would be something like this: mehhhhhhh. Well, this has all to do with environmental and cultural conditioning.

He emphasises that moos vary by location, and recommends that travellers talk to cows all over the country in any country, because “cows have a great deal to tell us.”

The hosts at the show offered a bit of a culture exchange and offered to show Patrick Stewart what a Nevada cow sounds like, but Stewart suprises them all by saying that his wife is also from Nevada, and he has experience with Nevada cattle, and did his best impression of a Nevada cow. His Nevada cow is high-pitched and nasal, because “that’s the way you people (the hosts) talk. The cows are influenced by how you talk, as you are influenced by the cows.”

The hosts went on to ask, “How a Cockney cow would moo?” Stewart replies:

You understand, Cockney cows are pretty rare these days. I mean, Shakespeare’s days, there were cattle in the middle of London. But nowadays, generally speaking, the city of London doesn’t feel too good about having cattle in Picadilly Circus for example. But I can you an idea of a Cockney cow — I’m old enough to remember when there were Cockney cows.

The resulting impersonation (incowation?) sounds like a mehh-aye! which Stewart describes as “more like a sheep than a cow.” The hosts points out that in Stewart’s walk-through of English cows so far, not only are there different cow accents, it seems that there are also different cow attitudes throughout the country. Stewarts extrapolates:

You are absolutely right. What you just heard just now was an urban cow. All of us who live in big cities, we have to be watchful, we have to be on our guard. We have to be prepared for fight-or-flight at any moment, and it is the same with cattle.

Breeding is of utmost important in humans, as it is in cattle. How would a well-bred cow sound like?

We had a Prime Minister many, many years ago called Alec Douglas-Home (pronounced hyoom) and one of the wonderful things about Alec Douglas-Home — including his name by the way, and you’d think that his name is probably spelled “H-double O-M-E” or “H-U-M-L” or something like that, his name was actually spelled “H-O-M-E” — home, but it was pronounced “hume,” and we do that mostly to confuse Americans, like Leicester Square and Lye-cester.

Anyway, the thing about Alec Douglas-Home was that he didn’t move his lips when he talked, (unintelligible mumble because Stewart is mimicking talking without his lips but the words sounds like: “and here’s an example, this is how he always talks. He didn’t actually move the lips.”) Because moving your lips is terribly bad taste. So, if Alec Douglas-Home had cattle, and I’m sure he did; he must have been a landowner because I think he was actually Scottish, his cows would mooed something like this: hrmmmmmm. Very refined, very sophisticated, very cultured. These cows had gone to Eton or Harrow (prestigious boarding schools), or at least the cow equivalent.

There you go, how British cows moo, and like humans, how they speak is also  affected by social ecownomics.

I’ll see myself out now.

Listen to the full podcast here, if not for the knowledge, then at least to hear Sir Patrick Steward moo like a cow!

Insert: Code Poetry

First Stanford code poetry slam reveals the literary side of computer code

The high-tech poetry competition, which explored how computer code can be read as poetic language, is accepting submissions for the next competition.

By Mariana Lage
The Humanities at Stanford

Leslie Wu presents her code poem
Leslie Wu, a Stanford graduate student in computer science, presents her code poem, ‘Say 23,’ which won first place in the Stanford Code Poetry Slam. Image: Mariana Lage

Leslie Wu, a doctoral student in computer science at Stanford, took an appropriately high-tech approach to presenting her poem “Say 23” at the first Stanford Code Poetry Slam.

Wu wore Google Glass as she typed 16 lines of computer code that were projected onto a screen while she simultaneously recited the code aloud. She then stopped speaking and ran the script, which prompted the computer program to read a stream of words from Psalm 23 out loud three times, each one in a different pre-recorded-computer voice.

Wu, whose multimedia presentation earned her first place, was one of eight finalists to present at the Code Poetry Slam. Organized by Melissa Kagen, a graduate student in German studies, and Kurt James Werner, a graduate student in computer-based music theory and acoustics, the event was designed to explore the creative aspects of computer programming.

With presentations that ranged from poems written in a computer language format to those that incorporated digital media, the slam demonstrated the entrants’ broad interpretation of the definition of “code poetry.”

Kagen and Werner developed the code poetry slam as a means of investigating the poetic potentials of computer-programming languages.

“Code poetry has been around a while, at least in programming circles, but the conjunction of oral presentation and performance sounded really interesting to us,” said Werner. Added Kagen, “What we are interested is in the poetic aspect of code used as language to program a computer.”

Ian Holmes explored Java language in a Haiku format

Ian Holmes, a Stanford undergraduate studying computer science and materials and science engineering, explored Java language in a Haiku format. Image: Mariana Lage

Sponsored by the Division of Literatures, Cultures, and Languages, the slam drew online submissions from Stanford and beyond.

High school students and professors, graduate students and undergraduates from engineering, computer science, music, language and literature incorporated programming concepts into poem-like forms. Some of the works were written entirely in executable code, such as Ruby and C++ languages, while others were presented in multimedia formats. The works of all eight finalists can be viewed on the Code Poetry Slam website.

With so much interest in the genre, Werner and Kagen hope to make the slam a quarterly event. Submissions for the second slam are open now through Feb. 12, 2014, with the date of the competition to be announced later.

Giving voice to the code

Kagen, Werner and Wu agree that code poetry requires some knowledge of programming from the spectators.

“I feel it’s like trying to read a poem in a language with which you are not comfortable. You get the basics, but to really get into the intricacies you really need to know that language,” said Kagen, who studies the traversal of musical space in Wagner and Schoenberg.

Wu noted that when she was typing the code most people didn’t know what she was doing. “They were probably confused and curious. But when I executed the poem, the program interpreted the code and they could hear words,” she said, adding that her presentation “gave voice to the code.”

“The code itself had its own synthesized voice, and its own poetics of computer code and singsong spoken word,” Wu said.

One of the contenders showed a poem that was “misread” by the computer.

“There was a bug in his poem, but more interestingly, there was the notion of a correct interpretation which is somewhat unique to computer code. Compared to human language, code generally has few interpretations or, in most cases, just one,” Wu said.

Coding as a creative act

So what exactly is code poetry? According to Kagen, “Code poetry can mean a lot of different things depending on whom you ask.

“It can be a piece of text that can be read as code and run as program, but also read as poetry. It can mean a human language poetry that has mathematical elements and codes in it, or even code that aims for elegant expression within severe constraints, like a haiku or a sonnet, or code that generates automatic poetry. Poems that are readable to humans and readable to computers perform a kind of cyborg double coding.”

Werner noted that “Wu’s poem incorporated a lot of different concepts, languages and tools. It had Ruby language, Japanese and English, was short, compact and elegant. It did a lot for a little code.” Werner served as one of the four judges along with Kagen; Caroline Egan, a doctoral student in comparative literature; and Mayank Sanganeria, a master’s student at the Center for Computer Research in Music and Acoustics (CCRMA).

Kagen and Werner got some expert advice on judging from Michael Widner, the academic technology specialist for the Division of Literatures, Cultures and Languages.

Widner, who reviewed all of the submissions, noted that the slam allowed scholars and the public to “probe the connections between the act of writing poetry and the act of writing code, which as anyone who has done both can tell you are oddly similar enterprises.”

A scholar who specializes in the study of both medieval and machine languages, Widner said that “when we realize that coding is a creative act, we not only value that part of the coder’s labor, but we also realize that the technologies in which we swim have assumptions and ideologies behind them that, perhaps, we should challenge.”

Mariana Lage is a visiting doctoral student in the Department of Comparative Literature.

When I was younger, I scoffed at the idea of programming language being a real language. “It is an efficient language, but lacks the capability for beauty,” I thought back then. That’s why we can achieve poetry with living languages and not programming.

Turns out people have been trying to prove that wrong. The folks at Stanford created a code poetry slam that attempts to bridge the gap between a language I (and many others) long-derided for being incapable of beauty, with poetry, the very art of turning language beautiful. It’s interesting to see how one takes a language with “stripped-out” syntax, one void of auxiliaries and other linguistic features, and tries to work with its sparseness to turn it into poetry.

This begs the question of “What is beauty? What makes word poetry beautiful?”

We find beauty in poetry in a number of ways: Some find the words used per se beautiful — word-image evocation. Some find the conjured images from metaphors beautiful — visual/thematic image evocation. Some even find the structures used to arrange the words beautiful — structural inspiration. The bottom-line is, there is some sort of inspiration or reaction drawn from the reader by the poem, and this reaction is essentially what we call the “beauty” we find in poetry.

When I shared this article on Facebook, a friend, who does programming, says that code can be beautiful too. He says, to him, an efficient code is a beautiful code; if an algorithm can figure out the solution to a problem in 10 lines where it takes him 50, the code is beautiful to him. While there exist similarities between the two, in that he is inspired by the efficiency of the “beautiful code,” that beauty is not the same sense of beauty that exists in natural language poetry, and the people at Stanford are trying to bridge that difference.

Where my friend equates the idea of efficiency to be beauty, “code poetry” tries to go beyond mere efficiency. Efficient “beautiful code” is just efficient code, and I think those at Stanford are trying go beyond just “beautiful code,” and trying for the same sense of “beautiful” that people find in imprecise written word with the precise structure of coding. The creativity from code poetry isn’t in the creative licence common to written poetry or in the ingenuity of finding a way to make the code more efficient, but possibly using a code that lies in-between.

In that, a code poet might end up with a slightly unwieldy, bulky code that programmers might think to be “ugly,” but appreciated beyond its efficiency, and applying the image-evocation processes of natural language poetry that traditional poetry beauty can be seen in code. At times, the code need not even solve anything, and in programming, that is just redundant code. But redundancy is very important in natural language, and by breaking away from the strictures of what makes good code, and eschewing snobbish ideas of natural language poetry superiority, can we begin to see the start of a novel way of understanding how beauty and structure can co-exist hand in hand.

Of course, as highlighted in the article, there’s the problem of access: those who do not understand programming cannot understand code poetry. Would this be a short-coming? Who would then be the arbiters of what makes good code poetry? Would we need masters of both the computing and natural language to dictate which poem highlights the sensibilities of both sides? I think not, actually. Take the Java haiku in the article above. I don’t understand Java, but I think there’s an element of beauty in that. I think anyone, as long as they’re willing to abandon what traditionally defines good code or good poetry, and listen to what inspires, what is beautiful to the mind, can appreciate good code poetry.

Protip: On “cnidaria” pronunciation

Pacific Sea Nettles, a scyphozoa, one of the four groups of Cnidaria (Image: Wikipedia)

If ever, when on the discussion of jellyfish, and the discussion moves to the topic of “cnidaria” and its related “cnidocyte,” the explosive cell mechanism by which they sting, containing the sub-cellular organelle called “cnidocyst,” and one is ever tempted to pronounce the “C” in the word;

Don’t.

Cnidaria –  /naɪˈdɛəriə/ (nai-DEH-uh-RI-uh)

Cnidocyte – /’naɪdoʊsaɪt/ (NAI-doh-site)

Cnidocyst – /’naɪdoʊsɪst/ (NAI-doh-sist)

Just sayin’.

Speaking fake English, or any other fake language

What qualifies the English language to sound “English” enough? Very often, people in the English-speaking world have impressions of what foreign languages sound like. Chinese (excluding stereotypical “ching-chong” variants) sounds like “Xie shi hao ni jing ling ping dao” to many English speakers, replete with its tonality, French has its velar R’s and lots of Z’s and nasalities, “Le beton est un plus morraise il a son telle fusontique des mon,” Italian has its inflections on certain syllables, and so forth.

What about fake English? Were a foreigner to make fun of what English sounds like to them, how would they reconstruct it?

Turns out faking a language at least requires the basic knowledge of morphemic and phonetic structure of that language. Why do people in the least go “ching-chong” when talking about Chinese and rattle their throats and noses trying to speak fake French? It’s because that they know these languages feature these consonant and vowel relationships.

Knowing the phonetic map is only one part of speaking a fake language, the other, to make the fake language sound convincing, is knowing how they fit together to form words.

The video above speaks fake Chinese, and as a Chinese speaker, I find it very far off, simply because he does not understand the tonal system of Chinese, nor can he reproduce certain syllables.

The video below shows a somewhat convincing fake English, as it imagines what English would sound like to foreign person who does not speak the language.

Any English speaker would realise that in that clip, it actually uses a lot of real English words, but for the most part is unintelligible, yet it still sounds distinctively English. I feel that the writers of the script relied too much on real words and simply garbling the rest, when they could have pushed the boundaries further of words they can change up using English phonomorphemic rules to create a convincing and clear fake English conversation.

I wrote previously that we can extract semantic meaning from nonsense words, through parallel sounds and morphemes attached to them. Likewise, for fake English, to sound most convincing, we need to preserve morphemes, because for some reason, English morphemes are very English to any English speaker. So much so that we attach them to foreign words when we attempt to Anglicise them. For example, we can say a person “kamikaze’d” or that perhaps something could be “taco-licious”. What that means exactly, I’m not sure, but we often use English affixes to bring foreign words to make them fit into our language.

Likewise, if we were to create nonsensical, fake English conversation, we must preserve these affixes, for they give words their purposes. For example, we use “-tion” to turn something into a process, such as “crown” to “coronation,” “investigate” to “investigation.” If I used a word like “hakilimation,” chance are, a competent English speaker can probably draw inferences that the root word would be “hakilimate.” If I said a person is “taffing,” the root verb is probably “to taff.”

Here’s my attempt at speaking fake English, using the rules I have highlighted. I think if someone weren’t paying close attention and heard this in the background, it could pass for real English. Also included are fake Chinese and Japanese, that, in my opinion, sound a lot more legit than those without knowledge of how the language is structured.

Here’s an example of a Microsoft ad that uses fake Chinese convincingly. Granted, a lot of the words are slurred, given its more conversational nature, but to those who know the language, some actual Chinese can be teased out from that blur of words.