Better Language Scrambling?

Lisira · Post by **Lisira** » Tue Feb 21, 2012 10:46 pm

I should preface this by saying that I can barely script, let alone code, and have absolutely no sense of the resources (either manhours to implement or server toll) that might be required. My second idea will unequivocally be easier in both arenas, but that's not going to stop me from voicing the first.

Ideally, it would be nice to have a small dictionary implemented (the length of a pocket dictionary, perhaps a few thousand words?), with each word arbitrarily weighted against skill level; instead of random scrambling, words would only be scrambled if their weight fell higher than the listening character's skill level. Articles and prepositions might be weighted at inept or amateur (thus allowing (likely erroneous) contextual interpretation), while more complex words or concepts might require adept/expert-level knowledge of the language to understand. To avoid the need for too many fallbacks and contingencies, misspelled words might always be scrambled (for characters less than master/grandmaster), and rationalized IC as "local dialects."

It seems a little silly when, as a Journeyman listener, a word as common as "Greetings" gets scrambled to "Ergetjlgu," just because it fell on the wrong side of a coin toss.

The second, much easier but much less interesting, implementation of this idea is to simply weight each word on the fly based on the number of characters in the word. Okay, "greetings" might still be scrambled, but with a few exceptions, there's a moderate correlation in the English language between the length of a word and its complexity. I might not expect a Novice speaker to know what "denouement" means, but "Come here!" probably shouldn't be scrambled for anybody but the most Inept students.

This might be a bit "fluffy" (i.e. not adding much to coded gameplay), but I think it'd make some RP (read: eavesdropping

) more interesting.

EDIT: Looking back over my logs, it appears scrambling already goes by word length, as anything under five characters is coming out fine, and five-letter words are translated about half the time. So, my second suggestion's kind of pointless.

Still, I think the first one (running speech through a dictionary) would be fun. If it's viable (i.e. wouldn't take a negative toll on server response time), I'd be willing to compile and weight a (plain-text, would still need parsing) dictionary to ease up on manhours required.

Mask · Post by **Mask** » Wed Feb 22, 2012 7:32 am

Awesome, if you put together the dictionary, I'll plug it in

Lisira · Post by **Lisira** » Wed Feb 22, 2012 7:52 am

Mask wrote:Awesome, if you put together the dictionary, I'll plug it in

Seriously?

I'll get on it this weekend.

Mask · Post by **Mask** » Mon Feb 27, 2012 11:09 am

Cool ideas for language scrambling:

1) Simplify the language spoken based on skill level, ie convert:

Excuse me, sir, but I would like to purchase your finest short blade

to:

Me want shortsword

2) For low skill levels, make people occasionally use the wrong words, ie, convert:

I would like a bag of wool

to:

I would like to bag your bull

How could this be made to work? Hmm.

If we had a simple dictionary of words together with how common they were, like for modern english, we could probably tweak it a little to give it a more FK-ish feel - for example, I would say that we would use the word 'fireball' and 'broadsword' a bit more often than in modern english...

If the dictionary had some simple markup in terms of nouns, verbs, prepositions etc, it would be cool to occasionally mix those up for lower skilled speakers. Also, some very uncommon words would just be untranslatable, no matter how short they were.

Lisira · Post by **Lisira** » Mon Feb 27, 2012 4:48 pm

I had some ugly distractions this weekend, but I'm a little ways through this and making decent progress.

I tried looking for a comparable list for modern English (i.e. "how common words are"), but there's nothing out there that's comprehensive enough, even for a short dictionary. So, I'm suffering and doing it by hand. What you're going to get in the first draft is just a list of words with arbitrary numbers assigned, 1 to 8, corresponding to mastery levels. I want distribution among levels to be on a bell curve, but that might wait until the second draft.

Even just tagging words "noun," "verb," etc. would literally double the file size (and how much was needed to parse), but if that's not a problem it's a simple matter to do. Foreseeable problems arise when words can be both nouns and verbs, or other parts of speech, depending on their context. Like "walk." How will the scrambler know the difference between "I want to walk to Waterdeep" and "I went to Waterdeep Walk", when it tries to switch funny words around?

Untranslatable words can just be removed from the dictionary, and then be treated however typos are treated.

I don't know why you'd want to bag my bull, but I think that would involve a rhyming dictionary, which is an entirely different beast. Also, I'm flattered.

Mask · Post by **Mask** » Mon Feb 27, 2012 7:14 pm

Wow, I think manually writing a dictionary is too awesome an undertaking. What if we were just to get a bunch of FR novels and run them all through a program which would just count the number of instances of each word in each book and use that as a commonality indicator based on firing them into a distribution and extracting different quantiles?

With the results of that, we could fire it through some other program which would find out the word-type from another dictionary and add that meta data in?

A selection of FR novels and a tool like this:

https://code.google.com/p/epub2txt/

And we have a dictionary! I might have a quick go at this and send you the result for review - PM me your email address.

Gwain · Post by **Gwain** » Mon Feb 27, 2012 10:56 pm

http://www.lipsum.com/

Lorem ipsum is a bunch of rubbish text generated to fill paragraph forms before they be filled with words, phrases and actual text. This is a generator for it. I find its a good source of readible jibberish similar to latin.

Lisira · Post by **Lisira** » Mon Feb 27, 2012 11:42 pm

Mask wrote:Wow, I think manually writing a dictionary is too awesome an undertaking. What if we were just to get a bunch of FR novels and run them all through a program which would just count the number of instances of each word in each book and use that as a commonality indicator based on firing them into a distribution and extracting different quantiles?

With the results of that, we could fire it through some other program which would find out the word-type from another dictionary and add that meta data in?

A selection of FR novels and a tool like this:

https://code.google.com/p/epub2txt/

And we have a dictionary! I might have a quick go at this and send you the result for review - PM me your email address.

Oh, I'd already got the word list. It was just a matter of weighting it. XD

And, really, frequency of use is not an indicator of ease of use. People learn to count in their native tongue at, what, three years? Four? But how often do you think you'll find "eight" or "thirteen" printed in a novel? =/

Honestly, this is nothing difficult, it's just a little time-consuming. But, if you like, we can try it your way and see what comes up.

Pirro · Post by **Pirro** » Tue Feb 28, 2012 3:41 am

Mask wrote:Cool ideas for language scrambling:

1) Simplify the language spoken based on skill level, ie convert:

Excuse me, sir, but I would like to purchase your finest short blade
to:

Me want shortsword

This would be really awesome, though from experience I can tell you that it would also be really hard.

Something for the longer term (hoping not to hijack this thread): once Lisira's dictionary is in, I could generate lists of nonsense words that look "elvish", "orcish", etc. These could be substituted in for uncommon words, instead of just scrambling them. For example, instead of

An elf says "There is a great meeting in the forest" --> An elf says "There is a great lfdsknh in the fprdtt"

you might get

An elf says "There is a great meeting in the forest" --> An elf says "There is a great omentie in the lassiya"

Lisira · Post by **Lisira** » Tue Feb 28, 2012 4:38 am

Pirro wrote:Something for the longer term (hoping not to hijack this thread): once Lisira's dictionary is in, I could generate lists of nonsense words that look "elvish", "orcish", etc. These could be substituted in for uncommon words, instead of just scrambling them. For example, instead of

An elf says "There is a great meeting in the forest" --> An elf says "There is a great lfdsknh in the fprdtt"

you might get

An elf says "There is a great meeting in the forest" --> An elf says "There is a great omentie in the lassiya"

+1

Better Language Scrambling?

Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?

Re: Better Language Scrambling?