Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Sabotaged by Polish orthography (plover.com)
123 points by gbacon on Aug 3, 2017 | hide | past | favorite | 132 comments


While we're in this thread: notice that pączki is a plural. Please remember this and don't say 'pierogies'—it's just pierogi.

There's no need to know the singular for pierogi, because no one has ever eaten just one.

In return, I promise to continue working to stop Polish people from pluralizing potato chips as "chipsy".


But at a certain point, a loan word becomes "adopted" by its language, right? And at that point, it doesn't make sense to carry over the foreign rules for pluralization, etc.

I'm not sure if "pierogi" is common enough yet for "pierogies" to become the valid way to pluralize it in English, but as a Russian speaker, "чипсы" sounds perfectly normal to me.

After all, if I were speaking Russian I wouldn't say "компьютерз" - I would say "компьютеры", using the correct pluralization in the language I'm speaking in.


Canadian, can confirm my family has been eating perogies for the past 30 years.


I'm trying to get them to say "czypy". I have so many white whales.


> I'm trying to get them to say "czypy"

You can try to interest "Rada Języka Polskiego" (Polish Language Council) into this topic. Just because we all know, every Polish speaker will follow their advice diligently :).

In early 90's, after the COCOM embargo on import of computers into former then Eastern Block had been lifted, there were initiatives to translate English-sounding word into something more native to Polish language.

An example of a word that badly needed such translation was 'interface', for which a proposed translation was 'międzymordzie' (literally: a thing between two faces) which sounded hilariously bad in Polish. There was another proposal to name 'computer mouse' as something even better, but I've forgotten the name since.

Fortunately, nobody followed, and 'computer mouse' is simply 'mysz' (mouse). I hear, that French had longer lasting successes with their translations.


> I hear, that French had longer lasting successes with their translations.

Kind of. Generally, ‶old″ words are consistently translated. For instance, we have souris for mouse (which is exactly the direct translation, because it conveys the same idea), ordinateur (which could be translated as ‶sorter/processor″) for computer, informatique (science of the information) for computer science, disquette (small disk) for floppy disk, logiciel for software, and so on.

However, newer ones (i.e. past ~1985) didn't catch so much, mostly because the official translation were awful and/or came years too late. For instance, CD-ROM is supposes to be cédérom, i.e. the straight phonetic transliteration – losing all meaning in the process; tablet is supposed to be ardoise numérique (digital slate) – evokes school, came years too late and was too long in comparison to the colloquial tablette; fouineur (snooper) for hacker (no comment...); cybermonnaie (cyber-money) for cryptocurrency – losing half of the semantic, and so on.


I heard the mouse was supposed to be called "manipulator stołokulotoczny" (manipulator with a ball rolling on the table - due to the way old style mice were designed).


The two Polish sentences every respectful Pole knows:

W Szczebrzeszynie chrząszcz brzmi w trzcinie.

And:

Stół z powyłamywanymi nogami.

I sure youtube will find you proper pronunciations.

Some called the mouse "gryzon" which means rodent, but that was very early days of computers in Poland - say 1990.


"Rodent" is occasionally used to reference the mouse in English hacker slang, as well. Example:

http://www.nongnu.org/ratpoison/


I often say rodent when I am trying to encourage my younger colleagues to use the command line.


"gryzoń"


    > I promise to continue working to stop
A general campaign to stop non-natively-Anglophone Europeans from using "funny" to mean "lots of fun" would also be appreicated.


Yikes, that could cause some heartbreak. Somebody says it was funny to work with you, with a big grin on his face.


> In return, I promise to continue working to stop Polish people from pluralizing potato chips as "chipsy".

Finns are great at this as well. Some examples:

  chips - sipsit

  shorts - shortsit

  donut, donuts - donitsi, donitsit

  ribs - ribsit (as in the food)

  wings - wingsit (likewise)
and so on. These are all basically established loans by now.

Then there's the recent abomination that seems to be getting popular as there's no established word for a mobile app yet:

  app, apps - äpsi, äpsit


> There's no need to know the singular for pierogi, because no one has ever eaten just one.

That, friend, is hilarious, and so true.


'Pieróg' is also name of the shape, and some local dishes will be named after the shape, like 'Pieróg jeżycki' (small pizza folded in half).


Just like 'spaghetto'


> There's no need to know the singular for pierogi, because no one has ever eaten just one.

Ain't that the truth! My Polish friend also introduced me to 'pierogi rooski'? Apparently, a Russian take on pierogi, with more meat. Do I have the spelling correct?

> In return, I promise to continue working to stop Polish people from pluralizing potato chips as "chipsy".

As a Jamaican, where banana chips are like a national snack, it was amusing to see "chipsy bananowe" for sale. Fun times; I plan to go back.


> My Polish friend also introduced me to 'pierogi rooski'? Apparently, a Russian take on pierogi, with more meat.

If you mean "pierogi ruskie", they don't have any meat in them, just quark with potatoes and onion, though they're often served with bacon. And this dish comes from "Ruś" (now Ukraine), not "Rosja", otherwise it would be called "pierogi rosyjskie". I hear that in Ukraine they call this dish "Polish pierogi".

> As a Jamaican, [...], it was amusing to see "chipsy bananowe" for sale.

I believe "it" was not a Jamaican, however amusing it was.


> "pierogi ruskie" [...], though they're often served with bacon

Blasphemy!


Stop messing with my mind, they are called 'vareniki'.


No Russian take on anything has more meat.

In Poland, 'pierogi ruskie' are ones with cheese and potato. Also delicious!


Ah, I must have been mistaken. Dziękuję!


Adding to the other comments: the closest thing that can be described as "Russian take on pierogi" is pelmeni. In Ukrainian cuisine, it would be vareniki.

Confusingly, Russian does have the word "pirogi" (plural; singular "pirog") - but it's a kind of pie, not a dumpling.


most people don't wear pant.


I see your chipsy and raise you tipsy.

Not because there's anything wrong with it, that is.

It's just always puts a smile on my face when I walk past a tipsy nail salon with a sign that means euphemistically, to my British ears, being on the path to getting properly drunk.


The old saw from too many bad Westerns of “I’ll see … and raise …” is an invalid string raise. Poker actions must be atomic.


Similar to Italian plurals - one "panino", many "panini", and "paninis" is not a thing.


Germans love to order espressos. If they say espressi it always sounds pretentious. Sometimes it's espressis. Then everyone who speaks Italian winces (especially the Italians).

Then again, when I'm ordering an espresso in Italy, my results with just "caffé" are also mixed because it turns into the poison scene from The Princess Bride:

- is the guy speaking Italian? - is the guy just asking for a regular coffee because he doesn't speak Italian? - do they maybe call it different in different parts of the country?

I will never solve this mystery, I guess.


Ok, so out of curiosity I googled "singular for pierogi", and it seems to be "pieróg". But there are a number of (seemingly) Polish people scattered around the Google results saying that this is never used, or incorrect, or refers to only the general form for filled dumplings.

Now I know that in real-world usage it would be unlikely to refer to one, but what if there was a need to? Let's say someone made a statue to pierogi, but due to budgetary problems only one of the several pierogi planned was constructed. A tourist asks you how many pierogi make up the statue. How would you respond? Would you use the singular, like "only one pieróg", or would you rework your response so there was no need for the singular, like "sadly only one of the several pierogi was built"? Which seems more natural?


More common than Pieróg is Pierożek - formally a little Pieróg, but most people would use this one to single out Pierożek in a plate of Pierogi.


Pieróg is correct.


I recently came across some "burekasim" in Israel. The double plural there has a taste of the entire mediterranean from turkish/greek/arabic/ladino/spanish/hebrew :) I guess it could be dialed up one more notch if an english-speaking tourist would ask for some burekasims.



Another Polish-speaker who read this article said she was delighted by my use of “ogoneks” instead of “ogoneki”.


For Polish speakers, your post is like watching someone dive from a high platform onto a cactus. You have our attention!


By the way, I was on the big island of Hawai‘i last month and I drove past Pszyk Road. As well as I can discover, this was named for a Polish immigrant whose descendants now have names like Harmony Wai‘olohe Pszyk.


This makes me extremely happy. I also think there should be exit visas based on last name. Let's not repeat the tragedy of this guy: http://www.thebaseballcube.com/players/profile.asp?P=Joe-Gwo...


My eldest daughter, a native Ponglish speaker, came out with Thankuję a few years back.

I've been dining off it since.


> Like watching someone dive from a high platform onto a cactus

This is a hilariously colorful analogy. Is it a translation of a Polish analogy, or are you just a very descriptive English speaker?


Maybe the former, definitely the latter; see http://idlewords.com/2007/04/the_alameda_weehawken_burrito_t... or any of the articles on his blog.


I am just a descriptive English speaker. Thank you for the kind words!


Thank you sir! I thought of you while I was writing it, and hoped that I would not make you feel embarrassed on my behalf.


It was an awesome post!


*"ogonki"


Right, thanks!


Ouch, that asterisk does some seriously confusing double-duty in a thread like this...

In Internet forum speak, the asterisk means "I meant to say" or "You meant to say".

In linguistics, the asterisk means "This form is not attested (we don't have examples of people using it)" or "This form would be considered mistaken by speakers".

That means that the forum usage of asterisks is something like "I should have said X" while the linguistics use of asterisks is something like "I shouldn't have said X". :-(

https://en.wikipedia.org/wiki/Asterisk#Linguistics

"In linguistics, an asterisk is placed before a word or phrase to indicate that it is not used, or there are no records of it being in use."

https://en.wikipedia.org/wiki/Asterisk#Typography

"Asterisks may denote corrections to misspelling or misstatements in previous electronic messages"


> In return, I promise to continue working to stop Polish people from pluralizing potato chips as "chipsy".

So, like "chipy"? I don't think it's a good idea :)


You never know. I have an old friend from college who has, in his entire life, eaten exactly one potato chip.


Don't forget panini!


Oh yeah! Fresh Italian Paninis! They're the best Paninis!


Polish orthography is a disaster partly because of the choice of Latin over Cyrillic. The latter would have been a much better fit for the sounds in the language.

Making matters worse, there are homophones (ż and rz, u and ó, ch and h) that depend on when the word entered the language.

Like so many things wrong with the country, you can comfortably blame the Catholic Church for this orthographic train wreck.


Polish letters on top of Latin are no worse than German (ä, ö, ü, ß), or French (é, à, è, ù, â, ê, î, ô, û, ë, ï, ü, ÿ, ç). Cyrillic is not the answer, as each Cyrillic country has own variations, own letters. See https://en.wikipedia.org/wiki/List_of_Cyrillic_letters

Regarding homophones, this is not 'when word entered the language' but rather, the change of pronunciation no longer matching the phonetic writing.

The 'ó' vs 'u' is homophone because the sound now is the same. However, historically it wasn't. Additionally, since historically it was different sound, it also had different rules for how it morphed during declension.

Similar situation is with 'ż' and 'rz'. There are even cool words that due to declension we can see where they originated from. For example two declensioned words have same pronunciation: 'każe' and 'karze'. The first comes from root word 'kazać' as in tell people what to do. The second comes from root word 'karać' meaning punish.

I would wager that 'ch' and 'h' is the most recent homophone. Some people still alive were taught to use correct hard 'H' vs soft 'CH' when pronouncing words.


Thanks for this correction!

I would say citing all those French diacritics is cheating a bit. The diaeresis is just used to indicate separate syllables, and ù and ÿ are not a thing in standard French.

Polish is uniquely bad not because of the diacritics, but because of the special phonetic behavior of digraphs ('si', 'ci', 'zi', 'rz', 'sz', 'cz', 'ch', 'dż', 'dź', 'dzi').


To strengthen your argument there is another digraph 'ni'.

But this is now approaching discussion that writing doesn't match phonetic. Pronunciation shifts. Words change. Brother Grimms (famous for fables) were linguists studying how consonants use has changed. They talk how 'pater' (like paternity) shifted to 'father'. But Polish has own such shifts. Take a word like 'jabłko', which many people pronounce 'japko' or 'śliwka' that is pronounced as 'ślifka'.

There is even a branch discussing how people using correct pronunciation are HyperCorrect https://en.wikipedia.org/wiki/Hypercorrection#Polish


I wouldn’t call it “using correct pronunciation”—the whole idea of hypercorrection is that the result is not correct, because some perceived rule was mistakenly over-applied, like in English * “He gave it to Sam and I”.

(Plus I’m guessing that sound change happened pretty early on, because as far as I know, all Polish consonant clusters obey that rule of voicing & devoicing obstruents.)


Digraphs aren't unique to Polish, though. Take Hungarian as an example ('cs' or 'zs' to start with).


I drove today next to the "nonstandard" Haÿ-les-Roses (south of Paris)


There's at least one word that uses ù in French: où (where, as opposed to ou, or)


The curious thing about Cyrillic is that, at least in Slavic languages, the correspondence between phonemes and letters is much more uniform than in Slavic languages that use Latin.

For example, "ч" is always the "ch" or "tsh" sound, whether we're talking about Russian, Ukrainian, Bulgarian or Serbian. OTOH, in Polish you use "cz" for it, while in Czech and Serbian it's "č".

It's not an inherent advantage of Cyrillic, of course, it's just that, by virtue of being designed specifically for Slavic languages, it had certain letters designated from the get go, while adopting Latin allowed for a lot more leeway in deciding how to represent sounds, and different languages did it differently.


It's not only that Cyrillic is designed specifically for Slavic languages, but also that the general tradition around Cyrillic is to make up new letters when adapting it to a new language, as oppposed to the Latin alphabet where the two traditions are to use digraphs and to add diacritics. This choice between two possibilities is already causing the problem of "cz" vs. "č", but within those there are still the choices of "cz", "ch", "tsh", "tch", ... and "č", "ç", "ĉ", ...


I think the line between "making up new letters" and "adding diacritics" is rather blurry, and really hinges on your definition of "new letter", which is largely defined by convention. Most (all? I can't think of any that don't) Cyrillic alphabets have some letters with diacritics - ё and й in Russian; й, i and ї in Ukrainian; j in Serbian etc.

But speakers of those languages don't consider the diacritics to be a modifier in this case - we treat them as completely separate letters, that just have a disjoint element in their overall shape. I don't see why e.g. c vs č can't be treated in the same exact way as и vs й (and indeed, I wonder if Czech don't already do that?).


And yet - somewhat hilariously, Polish is one of the easiest languages out there in terms of knowing how to pronounce the written word.

Spend a week or two practising a pronunciation guide and that's pretty much well it.

Reading something out loud for the first time? Native level accuracy 95% of the time, easy.

Compared with the absolute crap shoot which is English or - my nightmare - French, reading Polish out loud is an absolute walk in the park.


English is indeed a crap shoot, but it's not entirely fair to put French in that bucket as well. Much as you describe for Polish, there is a highly consistent set of writing-to-sound rules for French that you can learn quickly. French's downfall is in the other direction: with multiple ways to write some phonemes and many contexts where letters are silent, hearing a word, even perfectly, is not nearly enough to know with any certainty how to spell it.

But reading French out loud? Anybody with a week or two of practice should definitely be able to know how to pronounce well over 95% of what they read, even if they don't know what they're saying.


You're completely right. There's also a predictable syllable stress pattern. I taught a friend once to read Polish out loud fluently without understanding it.


Irish and Korean are also extremely regular. Hangul (Korean) is probably the easiest alphabet in the world to learn to pronounce. Irish suffers from the use of digraphs in modern use, but the older style of placing a dot on top of the letter instead of an h afterwards is much clearer IMO.


My mouth actually hurts when I speak French. Such unusual shapes for a native English speaker :\


In a famous polish novel "Krzyżacy" French language is compared to shaking of tin bowls, so even Polish speakers have troubles.


Who do we blame for Vietnamese adopting the Latin alphabet? That makes Polish seem pretty straightforward.


You can blame the French. It was a good choice though. Vietnamese and Korean are probably the easiest East Asian languages to learn to read.


I feel like Indonesian and Malay are far easier than either of those, at least for English speakers.


For some reason, they really didn't like China!


Could have done what Serbian (and formerly Serbo-Croat) does: freely added letters to the Latin alphabet to match the required set of phonemes, so that there's a one-to-one correspondence between the Latin orthography and the Cyrillic orthography.

(It doesn't solve the problem of missing letters and missing diacritics when trying to write your language using English letters only — but Serbs, Ukrainians etc. can't write their language correctly using only Russian letters either.)


We just have more of this crap than other Slavic languages. Specifically:

ś—sz

ć—cz

ź—rz/ż

dź—dż

Mandarin of all languages has this phonetic distinction, but I don't think any other Slavs kept the full set, on top of the nasals mjd wrote about.


Wikipedia says Mandarin has all of /t͡s/ ⟨z⟩, /ʈ͡ʂ/ ⟨zh⟩, /t͡ɕ/ ⟨j⟩, /t͡sʰ/ ⟨c⟩, /ʈ͡ʂʰ/ ⟨ch⟩, /t͡ɕʰ/ ⟨q⟩, /s/ ⟨s⟩, /ʂ/ ⟨sh⟩, and /ɕ/ ⟨x⟩, all contrastive. The fricative and affricate sounds in English that are kinda close to these are only /t͡ʃ/, /ʃ/, /s/, /t͡s/, /ʒ/, and /dʒ/.

I think these contrasts represent a very underacknowledged difficulty for English speakers learning Mandarin, because we're used to thinking of tones as the only hard thing. I'm still struggling to properly pronounce a friend's name that I think begins with /ʈ͡ʂʰ/. Maybe Polish speakers can deal with this easily!


My pronunciation of these consonants was improved a lot by reading the respective Wikipedia articles and imitating the described tongue position. That got me interested in the rest of the IPA, and for a while I spent idle time making strange sounds with my mouth.

Retroflex consonants are still awkward, but at least I can produce them reliably. What's really hard for me are uvulars, they feel a bit like choking every time.


I spent a long time learning to say [k'], and when I succeeded I had shirts made for myself and a friend who was practicing with me.

They say:

"I pronounced the velar ejective consonant [k'] and all I got was this [t']-shirt."


Everyone in this thread was a Linguistics major then?


I was a dropout, but I wish I had been a linguistics major!

There was a post on Language Hat or Language Log where a professor recounted meeting a student who had linguistics "as a hobby", which was astonishing to the professor -- and the student explained that this was possible nowadays because of Wikipedia. (To which I could add, for some people because of the conlang community, or because of linguistics blogs!)

Anyway, Wikipedia's coverage of linguistics topics is generally really excellent and detailed (maybe strongest in phonology, perhaps because of a few super-obsessed editors).


I "think" these should be pretty close to Cyrillic ш/щ, ч and ж.


Half of them are, the other half are the 'matching' pairs that don't exist in Russian (for example). Soft ш and ж, hard ч and щ.


It's a trap! Roughly speaking, the ones on the right are those. The ones on the left are different and they correspond historically to something like сь, чь, зь (though the pronunciation is rather different from standard Russian pronunciation).


well, sure, it's not uncommon even for ppl who speak the same language and use the same letters to pronounce things differently.

the point is I don't think Latin vs Cyrillic in Poland's case was driven by Polish language phonetics per se. I would venture to guess this had more to do with political/religious affiliations etc.


Absolutely. Mieszko I chose to convert himself and his realm to the Roman faith for both political and religious reasons. Had he chosen to accept Christianity from the East (remember, this was pre-schism), it would have entailed the adoption of Cyrillic.


Yes, Polish is the worst. Serbian has ć—č and the voiced equivalent spelled đ—dž, but it doesn't have ś—š and it doesn't have nasals.


Czech also did a great job of this (unfortunately the word Czech itself seems to come to English through the some other spelling system). A few years ago I tried to learn a bit of the language before travelling to Prague, and I was pleasantly surprised by how easy it was to learn to read and write.


Prior to changes introduced by Jan Hus (in this case, the introduction of the háček), Czech shared digraphs that are still used in Polish. "Cz" is was ultimately replaced with "č". It is possible that the archaic form somehow made it into English and survived (though my sources tell me that the spelling has only been present in English since the 17th century, or after Hus' reforms).


"The latter would have been a much better fit for the sounds in the language."

The Cyrillic alphabet as originally developed for Old Church Slavonic lacks several sounds of Polish, namely the velarized /l/ (which has now become a semivowel /w/ except in peripheral dialects), and the palatalized affricates /ź/ and /ć/.

If you are a Russian speaker, you might think that Cyrillic could represent palatalized sounds by use of the soft sign, but that it not what the soft sign was actually used for originally. It was meant to represent the front reduced vowel /ĭ/ before the fall of the yers. So, Russian choose to represent its phonology by extending the Cyrillic alphabet in a way that was originally unintended, while Polish chose to represent its phonology by extending the Latin alphabet. How is either of these choices better than the other?


Worth noting that Serbian did its own thing, and added new Cyrillic letters for palatized L and N as separate letters Љ and Њ, (which are obviously digraphs of Л and Н with Ь). Although that's a late 19th century invention.


Yes, modern Serbian orthography only did so in the end because Vuk Karadžić was intentionally following the example of Russian.


Ah, but it's not really an example of Russian in that sense, since he made them separate distinct letters rather than reusing Ь as a modifier, as Russian did. So semantically a rather different approach that just looks similar because it appropriated the shape.


Polish orthography is weird (subjective opinion!) mostly because of its weird digraphs, IMO. Czech and Serbian also use Latin, but they are easier to parse.

I'm not a linguist, but looking at the differences between the corresponding alphabet, it feels like Polish one had a stronger German influence. The use of W rather than V stands out in particular (and makes no sense in an alphabet that doesn't use V at all!). But also, Germans love their digraphs and trigraphs.


This may be true, but may I suggest two things:

1. Any suitable choice of orthography will become unsuitable eventually. Pronunciation is not static.

2. There is, and has been, quite a bit of dialectical variation within the Poland and its diaspora. What may be suitable for one group, may not be for another.


A small correction to the OP: paczki means packages and not boxes (pudełka). It used to be a much more common sign in Polish neighborhoods in the US before Poland leveled up to the first world.


.. and the of course it depends how you want to use the objective boxes (pudełka).

I will throw you some pudełka. I am busy with those pudełkami. There is something wrong with those pudełkom. Whats the size of these pudełek?

And its funny: the newest IOs still don't have "ą"


"Parcels" is a better translation.


In comments poles often omit those squiggles which lead to interesting nuances:

"Laske mi robi." means "to condescend/deign to do something for somebody" (ł) or "to do a blowjob" (l).


Similar to how Spanish will not accent capital letters. The México passport for example says MEXICO on the front.


How strong a rule is this? Or is it regional?

http://buscon.rae.es/dpd/srv/search?id=BapzSnotjD6n0vZiTp is from the Diccionario panhispánico de dudas, 2005. It shows several examples of accented capital letters, like LA NACIÓN in the headline of a newspaper, and the phrase "ESTÁ PROHIBIDO FUMAR DENTRO DE LAS DEPENDENCIAS DE LA EMPRESA."

That said, La Nación itself doesn't use accents when capitalized. On the other hand, you can see "EL PAÍS" several times at https://elpais.com/ . On the third hand, the El País in Uruguay doesn't use the accent http://www.elpais.com.uy/ . But it does uses ALCANZÓ in a headline http://a2010.kiosko.net/02/06/uy/uy_elpais.750.jpg .

http://procedimientospolicialesargentina.blogspot.de/2016/04... has an interesting mix with the blog title "DIA DE LA POLICIA DE LA NACION ARGENTINA" and a poster image saying "DIÁ NACIONAL DEL POLICÍA".


The reason it's similar is because it's not a rule. It's just sometimes left off.


My apology, I misinterpreted "will not accent" as implying some sort of rule.


Isn’t that just historically due to a lack of space on the line when typesetting?


Yes it is. For some reason many people strongly believe that the Royal Academy (which oversees the language) forbids the use of accents on capital letters, even when it explicitly claims that this has never been the case:

http://www.rae.es/consultas/tilde-en-las-mayusculas


French, or, if you will, FRANCAIS (note the lack of cedilla) is the same.


The rule is to write Ç with cedilla even when capitalized, but from what I gather French keyboards make it inconvenient, so people often neglect it. On the other hand, it is true that there are no above-letter accents for capitals (in France; Quebec uses them).


French keyboards indeed have diacritics available, such as é, è, à, ç, etc. but shift+those keys is another character.

On top of that, at least on Windows (I'm not sure about Mac), the caps lock key is not a caps lock, but a shift lock. Which means it doesn't capitalize characters, but does the same as pressing shift+key.

Funnily enough, in other French locales (IIRC at least Belgium), the caps lock key on Windows is a caps lock key.


For the benefit of French speakers, the ą in Polish seems to be pronounced like the "on" sound.

So "pączki" is pronounced "pon-tch-qui" while "paczki" is "patch-qui".


I lived in a Polish neighborhood in Brooklyn for a few years, and absorbed as much orthography and pronunciation as I could during that time. I ate a few pączki, too.

While I learned to speak a little bit, it was not enough to be functional beyond basic greetings and ordering food. It was always a lot of interesting fun, though.

The main outcome of all this is that for ten years, whenever I see a Toyota Camry, my weird brain thinks "hmm... tsahm-rih..." and I roll my eyes at it.


Ugh, this reminds me of character encoding issues... I hope I never meet the guy who invented ISO-8859-2 or CP1250 or other such nonsense for that guy's sake...


This brings the memory of Internet chats in the 90s when almost nobody used Polish letters with diacritics (because it is faster to type without them and there were a lot of different incompatible encodings). Usually you can understand the meaning from the context but sometimes the same kind of funny misunderstanding would happen as there are quite a few words that without diacritics become completely different but valid words.


Wikipedia gives "Zażółć gęślą jaźń" as an example containing all the Polish diacritic letters.

Is there a standard way to write this when limited to ASCII?

For example, Danish/Norwegian replace æ, ø, å with ae, oe, aa. It seems less likely that something could exist and be reasonably readable for Polish.

https://en.wikipedia.org/wiki/Polish_orthography


It is quite common to remove the diacritics if you are lazy or don't have access to the diacritics. That phrase becomes "Zazolc gesla jazn".

Most search engines find "Zazolc" and "Zażółć" equal because of that. This becomes a problem in case of the words like "paczki" (boxes) and "pączki" (donuts), which have their own separate meaning - as explained in the article.

In contrast to most European countries, in Poland we use American keyboard layout with "Polish (programmers) layout" keyboard setting in OS.

You press ALT+A, ALT+E, ALT+L, ALT+S, ALT+C, ALT+Z, ALT+X to write "ą", "ę", "ł", "ś", "ć", "ż", "ź", respectively.


Mostly no.

Some words written without Polish characters can become ambiguous without context. For example: word "łaska" - "mercy", written without Polish letter "ł" is "laska" - "stick".

Although, in old good times some people used British pound character in texts to express this letter, since '£' is visually similar to 'Ł' and more often available.

For other characters (ś,ź,ć,ż,ó,ą,ę) nothing like that was widely adopted. Workaround would be possible for 'ż' and 'ó' since they are (almost - see below) phonetically identical with 'rz' and 'u' respectively, but it wasn't popular, since most probably would be perceived as sign of very, very bad orthography.

*Almost, since some people claim they can distinguish these, but it's not popular ability.


When required people just omit the diacritic signs and replace the letter with regular ASCII letter, so for example: ż->z, ł-l, ó->o, etc.

When used in (computer) writing this is very readable, but no one would do this with handwriting, I think most common cases nowadays would be SMS messages (especially on dump phones) or some weird displays that are unable to properly render diacritic letters.


Its curious how simultaneously we are stubbornly holding onto diacritics (and current orthography in general) while also having continuously lots of issues actually reproducing/handling them. One would think that we either would have gotten better at dealing with them, or dropped them altogether. Especially considering that modern fixed orthographies are generally relatively recent phenomenon.


Usually the diacritics exist because of something that is contrastive in the language's phonology. This post is mainly about a concrete example of this, where pączki and paczki refer to two different things that a shop could sell. So it's super-helpful that shops can make that distinction in writing!

Or for example

https://en.wikipedia.org/wiki/Minimal_pair#Stress

In Portuguese, my strongest foreign language, I can think of examples like

a 'the (feminine)' / à 'at the (feminine)'

nó 'knot' / no 'in the'

dá 'gives' / da 'of the'

nós 'we' / nos 'in the (plural)'

sê 'I should be' / se 'oneself' / sé 'see (Catholic)'

pode 'can' / pôde 'could (past)'

avô 'grandfather' / avó 'grandmother'

tem 'he/she has' / têm 'they have'

among many others.

At least a couple of these reflect vowel differences, although some would be homophones in speech. Every language that holds on to diacritics would have pairs like this where the diacritics make a difference to the meaning. So people may really appreciate having writing systems that can reflect these differences in order to avoid confusions that would otherwise occur.


Obviously naively just stripping diacritics won't work. But that doesn't mean that diacritics are the only option for orthography; for example in nordics ä and ö (or the equivalent norwegian/danish ones) could be substituted with ae and oe, because those do not occur otherwise in the language, and so the orthography remains unambiguous.

Maybe a good example of related development is the disappearance of Þ (thorn) from english orthography, being replaced with th.


> Maybe a good example of related development is the disappearance of Þ (thorn) from english orthography, being replaced with th.

This didn't have quite the full benefit that you describe, though, because of compound words like "cathouse", "hothouse", "lighthouse", "outhouse", etc. And of course digraphs can be extra-risky whenever languages accept loanwords.


Fun fact: The Germanic umlaut symbol actually developed from placing a small "e" above the vowel.

https://en.wikipedia.org/wiki/Diaeresis_%28diacritic%29#Hist...


Nit: it's not pawnch-kee if you're speaking American English, it's more like pown-chkee. But still not quite.


Or, for those who are versed in IPA: /pɔ̃.tʂki/.

* There is no individually-pronounced "n" sound; it is built entirely into the nasalized vowel.

* The /ʂ/ sound is actually a laminal retroflex. This is best described as making a "sh" sound, but with the blade of the tongue.



Yeah, we poles are as obsessed with our prononciation as much as italians are obsessed with their food.

People form other countries just can't get it right :)


It is pronunciation with u, not prononciation...


I wouldn't be so sure about that.


The french `bON ton` is the closes to ą you can get.


This phoneme (or something close) is also common in Portuguese.


Yeah, I think you're closer, as long as you pronounce the own like owned rather than pow-n, as I read it at first.


In Michigan people will unabashedly say poonchkee. I've stopped arguing about it with people :).


this reminds me of what seems like a neverending debate about the actual pronounciation of paczki. Is it 'pawnch-kee' or 'poonch-kee'. haha.

Which lead me to this interpretation https://www.youtube.com/watch?v=zdNsFOPYMzE


It should be a vowel sound similar to the English words "thought", "dawn", "fall", and "straw", except nasalized. So "pawn" is probably closer than "poon". (If any one of those have a different vowel sound than the others in your dialect, use the more popular version.)


FYI: The way I explain it to those who speak French is that "ą" is similar to the French "on".




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: