PetBotAI: 1 Chatbot, 4 Clouds, 4 ML Models, Cocktails & Poop!

Paul Heath
7 min readMar 29, 2018
2 of the ML features: Word Association & Style Transfer

To dig deeper into the chatbot hype, I embarked on this wacky project of building one that took on the personality of anyone’s pet(s).

OK, the personality part is a stretch goal, but forced more creativity. Some Machine Learning techniques were thrown in to provide “cool tricks” your hi-tech virtual pet might do, like Image Style Transfer and Word Association!

Had to learn enough about a few things to get rolling, e.g. Dialogflow, Node.js, AWS DynamoDB, Facebook Messenger Bots, Machine Learning options and then integrate with various provider API’s.

The following online training was invaluable to build foundation …

  • The Complete Node.js Developer Course, by Andrew Mead (udemy.com)
  • Messenger Chatbot — Dialogflow and Node.js, by Jana Bergant (udemy.com)
  • Machine Learning — Andrew Ng (Coursera.org).
  • Lots of reading — see reference list below

I have done development in the past (Java) but pretty rusty after 7-ish years of no real coding. So the learning and re-engaging process was a lot of fun.

Trailer Video

Actual implementation has been removed from Facebook. Here is what is was:

3 Main Sections: Pets & Games above + Skills & Tricks

Goal

Audience: Create a fun, engaging chat interface for Pet owners to interact with their (smart) virtual pets. Endow the pets with super-power tricks that will amuse and inform their owners.

Me: Learn some cool new stuff, work & think through the process of building a chatbot with multiple personalities!

This is an ongoing experiment/learning process. As the underlying AI improves (e.g. music generation), the goal is to keep the chatbot/ML capabilities up to par.

Cocktails & Poop

They are in the title, so here ya go … I created a little space between them!

(1) Search & Random Cocktails (2) Varying amounts of poop (1–10)

Features

Create Pet(s)

Cat, Dog or Bird (so far). Create multiple pets. Define name, age, sex, personality type, accent, pet’s name for you etc. People have also added snakes and turtles … as a cat!

Skills & Tricks

  • “Paint” a picture (“style image”)
  • “surprise me” — randomly picks a quote, video, gif etc. to display.
  • Create a music clip (“compose music”)
  • Make a cocktail (“cocktail”)
  • Pet activities & schedule (based on local events API) — “schedule for today”.
  • Quotes (“quotes”)
  • Trump Quotes (“trump”)
  • Number Trivia (“trivia” or “trivia NNN”) where NNN is a number.
  • GIFs (“gifs”)
  • Poop (“poop”)
  • Play Word Association game (“play words”)
  • Read you a story (“read a story”)
  • Chit-chat (“chit chat”)
  • Weather (“weather for seattle”)
  • Animal Videos (Youtube) — (“animal videos”)
  • Search Web (Videos, News, Trending) — “search for XYZ
  • Audio greetings after setup, and subsequent logins / pet selections.
  • “commands” — list of commands to enter directly

Profanity Filter

Necessary! Gets tripped quite a bit. Approx 450 words. Not that anyone else sees what each user writes, but the offender gets gentle reminder when they use a bad word. Don’t want this turning into smut central.

Easter Eggs

Triggered by several phrases. Hint: one results in some pet themed dance music — totally ridiculous, but fun.

4 Cloud Providers

Could have consolidated some of this, but decided to spread the Cloud love around.

Cloud 1: AWS

AWS — EC2

Houses 4 specific machine learning models with (Flask) API endpoints for:

  • Photo ‘Painting’ — Image Style Transfer via Convolutional Neural Network (CNN) with 6 reference images.
  • Word Association — Word Vectors (Google 300K word2vec).
  • Nonsense Chat — Recurrent Neural Network (RNN) seq2seq using Anna Karenina corpus. Pulled for now, pending improved training.
  • Music Generation — Google Magenta (using Nottingham music dataset)

AWS — DynamoDB

Persist basic Owner and Pet information. Total overkill for the DB, just trying to be a NoSQL player.

AWS — Polly

Text to speech processing, in multiple accents (USA, British, Aussie & Indian). Used to create personalized welcome audio clips, using Owner & Pet names, read stories etc. Depending on the age of the pet, older/younger voices are picked.

AWS — Comprehend

Determines key phrases and entities in text. Used to help out with generating responses, e.g. picking most relevant key phrase from a web search response.

Cloud 2: Azure Cognitive Services

Videos API, News API, Entities Search API, Web Search API.

Used to to locate web content, e.g. find Tom Brady or What’s the latest news. Also used to manufacture responses when direct answer not available.

All called with the “SafeSearch” option, to limit inappropriate content.

Cloud 3: Google (Dialogflow)

DialogFlow Conversation Agent Platform to manage Intents, Action Mapping, and some default responses. The actions are coded in Node.js since they often require 3rd party API processing. Also included the prebuilt Dialogflow “Small Talk” intent package with a few tweaks to aid in chit chat.

Cloud 4: Heroku

Main Node.js application deployed to Heroku, for simplicity. Could have probably used Cloud Functions/Lambda functionality in any of the previous 3 Cloud providers.

Components & Flow

Challenges

Flow

Maintaining a natural and credible open-ended interaction with any chatbot is a major challenge. In the event no mapped response is available to a user’s input, a few tricks are needed to manufacture responses, e.g. using the Word Association functionality and Web Search/Key Phrase extraction, when stumped. Sometimes that works well, sometimes it is comical and other times makes no sense. In this context, since it is an animal “responding”, bizarre response can be OK. This can definitely be improved by working harder to decipher more accurate context from the user.

Content & The Public

Since this is open to the public, it’s obvious that some unexpected inbound subject matter would be received. I had to write new intents to handle incoming comments around porn, gender identity, racism and just flat out insults … we’re not in polite Enterprise-land any more Dorothy. I saw this as an opportunity to dial down the hostility of some of the angrier users.

For example, “You’re racist” resulted in an initial chatbot response of “Thanks, I try” — not good. This got generically changed to “I wish you only the best” for a variety of insults. Overall, the users are great, but the bad ones — oy!

Also, dialing in how people ask questions, e.g. “what ya doin” does not resolve to the predefined “what are you doing”, so the user doesn’t get the desired experience until “what ya doin” is mapped. Filling the holes.

The Go Live & Feedback

Ran some $3-15/day advertising on and off over a few weeks on Facebook to entice Cat, Dog and Bird lovers to engage with the chatbot. Wanted to reach random nationwide (plus UK, Canada, India, Australia) users to solicit unvarnished feedback and interaction. Actually a cool and relatively inexpensive way to beta test. Made a bunch of adjustments based on watching user inputs. So far, approx. 400 users have engaged with the app. Much lower cost per result with the India ads — and they are generally more polite.

Initial Engagement via Facebook Ads

The “go-live” is essentially a rapid iterative deploy/learn/enhance process. Started with $3/day advertising (“very soft live”) — moving up to $15/day — I know, big spender, but it served its purpose well.

Initial takeaways included needing better hand-holding of user to expose the available functionality. Also implemented some quick intent mapping changes in DialogFlow, i.e. user says “X”, map to action “Y”.

Overall users like Image Styling, Word Association, the Speak functionality and general Chit Chat. They don’t spend too long in Nonsense Chat, although it gets picked a lot, so that training/module needs to be improved, and maybe a different corpus, for more natural responses.

Nothing like real users … one guy uploaded a pic of his penis, next to a soda can (“for scale”), to be “styled”. Seriously?! Others like to use the Speak functionality to tell themselves how impressive they are, in multiple categories. Cracks me up. People are generally well behaved but approx. 10% put out some pretty salty language or inappropriate pics.

What Now?

This could be a never ending project of upgrades but following come to mind:

  • Keep learning from the user interactions — tweak intents, utterances and responses to produce a more natural “banter”. Incorporate more sophisticated NLP where possible — needs more study on what’s do-able.
  • Upgrade the Machine Learning models to provide higher fidelity outputs — Music is the obvious one. Enhance Nonsense Chat by training on more corpuses, e.g. Shakespeare, Movie dialog database etc.
  • Incorporate more of the Pet Personality traits into the interactions, e.g. does a shy cat ever say “Boom Shakalaka” ?!
  • Improve stickiness, to include additional relevant API integration.
  • Architect to work on multiple chat platforms with high fidelity rich messages. Currently only works on FB Messenger.

The Service Providers

Had to weave together a number of services to pull this off — but the various REST API’s were generally easy to deal with.

Microsoft Cognitive Services: https://azure.microsoft.com/en-us/services/cognitive-services

Amazon Web Services: https://aws.amazon.com

Heroku: http://www.heroku.com/

Google Dialogflow: https://dialogflow.com

Facebook:

https://developers.facebook.com/docs/messenger-platform/prelaunch-checklist/

EventFul: http://eventful.com/

Cocktail DB: https://www.thecocktaildb.com/

OpenWeather: http://openweathermap.org/

Rapid API: https://rapidapi.com/

Youtube: https://developers.google.com/youtube/

The Reference Sites

Other than the aforementioned training classes, following proved very useful for specific and general knowledge on the various subjects.

  • Wild Week in AI
  • Siraj Raval (YouTube)
  • AI Trends
  • KDNuggets.com (Gregory Piatetsky)
  • Towards Data Science
  • Chatbots Magazine
  • Various Medium Posts

How to Connect

If you made it this far, Thank You for reading and indulging this mad undertaking. Definitely a fun, mind-stretching, learning experience and I look forward to keep pushing on it. No current plans to commercialize — mostly curious to see what makes it compelling enough for people to use. Also fascinating to see what users do.

Feel free to communicate (constructive) ideas or feedback to: pheath@yahoo.com

https://www.linkedin.com/in/pheath

--

--

Paul Heath

2x CTO Co-Founder from the travel world. Retired but can't help still coding & researching the cool tech.