Service market follows a service mindset

To grow its service market, Holon and Danish Technical University worked with Interface to better understand the company’s service activities and uncover ways to develop further a service portfolio…


独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Making a Text Generator Using Markov Chains

Having a computer sound like a person with text being auto-generated sounds like the future. There have been small proof of concepts, such as Gmail’s predictive text as you write out an email, and the point where this concept is truly realized may be arriving sooner rather than later.

A small project I completed, that doesn’t use neural networks to generate text, used Yelp Reviews to generate new reviews using Markov Chains. The text generated, although not perfect, does an incredible job of capturing the context, sentiment, and style of a review written by a typical user.

Markov Chains are mathematical systems that go from one state to another. A few rules associated with this broad statement are as follows: the next state is entirely dependent on the previous state. The next state is determined on a probabilistic basis.

Example Image of Markov Chain from

To put this into the context of a text generator, imagine an article you recently read. That article contains x number of words where there are probably many words used multiple times. For each word in the article, the words directly after are grouped together where words that occur more often are weighted more heavily. When generating text, a random word is chosen and from that a random word from the list of words is selected continuously until the desired word count is reached.

In python terms, you create a dictionary of every unique word in your corpus. From there, you make the values of your dictionary a list of words that appear after each unique word. The more often a word appears after another, the higher the probability that it will be selected in the text generator.

Sample text generated

As you can see from above, the text generated looks, at a high level, as if a human being wrote it. It is when you look deeper into the words that it becomes a little more evident that it doesn’t make sense holistically. Something to fix this could be adding weights to the probabilities of the words selected.

While the concept is simple, the hardest part of creating a generator using Markov Chains is to ensure you have enough text in your corpus so the text you generate doesn’t end up being the same words over and over.

Markov Chains can work wonderfully in generating text to mimic a human being’s style. However, in order to effectively generate text, your corpus needs to be filled with documents that are similar. In the example above, I captured 3 star reviews from Yelp. However, it contains phrases like manure, office buildings, nfl, and theater. These are generally unrelated and would not be posted in a typical review. In order to correct this, you will need to keep documents discussing similar topics (i.e. pizza parlors) in the same corpus and use that for Markov Chains. This way the text generated will all be pizza related. That being said, for such a simple execution, the result of the text generated is remarkable and much easier to obtain than heavily trained neural networks!

Add a comment

Related posts:

How to learn to draw?

From ancient times man has expressed himself through drawings. It is currently an art form where anyone can learn to draw. Some techniques make it easier to learn. Starting drawing by sketching is…

Attractive Home in Mountains South Carolina

A large part of the American vision is to own your residence. For some, this may signify a small, self-effacing condo while for others it may mean a bigger house to raise a family in. Still for…

What are my strength and weaknesses in relation to completing this project? And how did I address them?

During this project I became aware of some skills that I have into developed as much as others. I was in charge of the brand design for this project and everything that came along with it such as the…