Skippy

SkippyTalkBot is an AIM chatterbot. He runs in Python under Twisted and use the Google Web Services API to obtain text. Skippy's conversation is never really particularly coherent, and his design is incredibly basic, but he's fun to play with from time to time.

Skippy started life as GilesGoogleBot. My original plan had been to use Google to generate only sentences by me. Unfortunately, I don't have enough suitable text on the Web. I also realized the robot was going to be kind of stupid, and I didn't want to make either Google or myself look stupid by association. In addition to being kind of an idiot, the robot can also say offensive things from time to time, in a deranged, meaningless way -- the other day it told me "I have discovered there are four different kinds of lesbians" -- so I decided giving it its own name would be less slanderous to myself and Google.

More sophisticated versions of Skippy are on the way, but for now, here's a sample conversation.

------------------------------------------------
Start of SkippyTalkBot buffer: Mon Nov 07 10:35:12 2005
[10:33] dreamstooloud: Hi Skippy
[10:33] SkippyTalkBot: Thanks for a great plugin!
[10:33] dreamstooloud: Say hello world.
[10:33] SkippyTalkBot: Korean!
[10:33] dreamstooloud: Can you understand the things I say particularly well?
[10:33] SkippyTalkBot: What?
[10:34] dreamstooloud: I mean, can you understand me? Are you alive?
[10:34] SkippyTalkBot: I say something?
[10:34] dreamstooloud: Yes, you say something. Tell me if you understand me.
[10:34] SkippyTalkBot: I am.
[10:34] dreamstooloud: Holy shit.
[10:34] SkippyTalkBot: Objects from my collection of religious kitsch.
End of SkippyTalkBot buffer: Mon Nov 07 10:35:12 2005

OK, I don't know what to tell you. Sometimes that just happens. More often, though, Skippy doesn't make the tiniest bit of sense. I just woke him up fifteen seconds ago to generate some sample text, and this is what I got. WTF.

Here's a more representative sample, from last week.

[15:53] dreamstooloud: say hi to Kendall, Skippy
[15:53] SkippyTalkBot: Jacko.
[15:53] dreamstooloud: exactly.
[15:53] SkippyTalkBot: Technology?
[15:54] dreamstooloud: yes, Skippy! Technology! Technology is good.
                       We like technology.
[15:54] SkippyTalkBot: Dr.
[15:54] dreamstooloud: okay whatever
[15:54] SkippyTalkBot: I always read about the people being
                       executed going peacefully.
[15:54] dreamstooloud: wtf?
[15:54] SkippyTalkBot: Now?
[15:54] dreamstooloud: okay let's start over
[15:54] SkippyTalkBot: Yea!
[15:54] dreamstooloud: Skippy, this is Kendall. Say hi.
[15:54] SkippyTalkBot: Dr.
[15:55] dreamstooloud: say hi
[15:55] SkippyTalkBot: Fun.
[15:55] dreamstooloud: Kendall works for [name of company].
[15:55] SkippyTalkBot: Linux is like a wigwam.

Obviously, this really just happens in an AIM window (or actually a Trillian window), but I wanted to make it look old-school.

This is what it really looks like, except the text is extremely unusual. (Don't ask me why. Some fluke of statistics.)

Actually, it kind of freaked me out.

Anyway, here's the code. As should be pretty obvious, Skippy has less brains than a Pekinese. He consists of two files. One is adapted from a Twisted IM package OSCAR (AIM) demo; the other is adapted from a tutorial on using the Google Web Services API. Skippy's code is pretty much as dumb as he is, to tell the truth. I actually never really bothered to clean this up, I just wanted a proof-of-concept bridge from AIM to web apps. Skippy can very easily open URLs using either Twisted or just Python's built-in libraries, which means that you could do some pretty interesting things in terms of the interface. The idea for this came to me during a meeting at a small business where one person said they wanted office scheduling software which could run on an internal server, and another said they didn't think any of the technical users would bother to keep a browser window open just in case they needed to use that particular software. I said, well, why not just plug the software into an AIM account?

Anyway:

skippy.py

#!/usr/bin/python
from twisted.internet import default
default.install()
from twisted.protocols import oscar
from twisted.internet import protocol, reactor
import getpass
import re
import ZSI_x
import random


SN = "SkippyTalkBot"                         # screenname
PASS =  "******"                             # ghosted
hostport = ('login.oscar.aol.com', 5190)
icqMode = 0

debug = 0


class B(oscar.BOSConnection):
    capabilities = [oscar.CAP_CHAT]
    def initDone(self):
        self.requestSelfInfo().addCallback(self.gotSelfInfo)
        self.requestSSI().addCallback(self.gotBuddyList)
    def gotSelfInfo(self, user):
        if debug: print user.__dict__
        self.name = user.name
    def gotBuddyList(self, l):
        if debug: print l
        self.activateSSI()
        self.setProfile("SkippyTalkBot is by Giles Bowkett [dreamstooloud].")
        self.setIdleTime(0)
        self.clientReady()
    def receiveMessage(self, user, multiparts, flags):
        if debug: print user.name, multiparts, flags
        if debug: print "multiparts!! ", multiparts
        # auto messages should not be responded to. identify them by
        # the string auto, found in flags[0] (sometimes).
        try:
            auto = flags[0]
            if auto == "auto":
                return
        except IndexError:
            pass
        self.lastUser = user.name
        multiparts = self.modifyReturnMessage(multiparts)
        self.sendMessage(user.name, multiparts, wantAck = 1, \
                        autoResponse = (self.awayMessage!=None)).addCallback( \
                        self.respondToMessage)
    def respondToMessage(self, (username, message)):
        # in the original Twisted AIM demo, this just printed out a message
        # indicating that the IM had been sent. Twisted requires Deferreds, but
        # writing up a new one isn't really necessary here.
        if debug: print "in respondToMessage"
        pass
    def receiveChatInvite(self, user, message, exchange, fullName, instance, shortName, inviteTime):
        pass
    def extractText(self, multiparts):
        # messages consist of HTML enclosing text. since a message can
        # probably include different HTML for different message styles, we
        # skip the HTML and pull out the text. one other thing, it looks as
        # if message itself is a one-element list containing a one-element
        # tuple. wtf? probably something to watch out for...
        message = multiparts[0][0]
        # find non-html surrounded by html; anything between > and < which
        # contains neither > nor <
        match = re.compile(">([^><]+?)<").search(message)
        if match:
            return match.group(1)
        else:
            return message

    def modifyReturnMessage(self, multiparts):
        # multiparts usually arrives as a list containing one element,
        # which is a tuple containing one element. I have no idea why
        # and it seems like an extraordinarily odd way to structure
        # data. thus this code is highly risky, but it basically just
        # creates a new message back promising to google for the text.
        if debug: print "in modifyReturnMessage"
        message_text = self.extractText(multiparts)
        snippets = []
        for snippet in ZSI_x.google(message_text):
            snippets.append(snippet)
            if debug: print "added snippet: ", snippet
        try:
            message_text = random.choice(snippets)
            if debug: print "message text: ", message_text
        except IndexError:
            # IndexError indicates no snippets returned by snippet code.
            # it shouldn't happen, but it shouldn't kill Skippy either.
            pass
        multiparts[0] = (message_text,)
        return multiparts

class OA(oscar.OscarAuthenticator):
   BOSClass = B

protocol.ClientCreator(reactor, OA, SN, PASS, icq=icqMode).connectTCP(*hostport)
reactor.run()

ZSI_x.py

# code based very much on:
#   http://www.xml.com/pub/a/ws/2002/06/12/soap.html
# uses ZSI SOAP package to do Google search
import socket, cStringIO, httplib, re, descape
from ZSI import *

GoogleNS = "urn:GoogleSearch"
GoogleURL = "/search/beta2"
GoogleHost = 'api.google.com'

debug = 1

class Generic:
    def __init__(self, name):
	self.name = name

class tcDirCatArray(TC.Array):
    def __init__(self, pname=None, **kw):
	TC.Array.__init__(self,
	    'DirectoryCategory', tcDirCat(), 'directoryCategories', **kw)

class tcSearchResult(TC.Struct):
    def __init__(self, pname=None, **kw):
	TC.Struct.__init__(self, Generic, [
	    TC.String('summary', unique=1),
	    TC.String('URL', unique=1),
	    TC.String('snippet', unique=1),
	    TC.String('title', unique=1),
	    TC.String('cachedSize', unique=1),
	    TC.Boolean('relatedInformationPresent'),
	    TC.String('hostName', unique=1),
	    tcDirCat('directoryCategory'),
	    TC.String('directoryTitle', unique=1),
	], pname, inorder=0, **kw)

class tcResultArray(TC.Array):
    def __init__(self, pname=None, **kw):
	TC.Array.__init__(self,
	    'ResultElement', tcSearchResult(), 'resultElements', **kw)

class tcDirCat(TC.Struct):
    def __init__(self, pname=None, **kw):
	TC.Struct.__init__(self, Generic, [
	    TC.String('fullViewableName', unique=1),
	    TC.String('specialEncoding', unique=1),
	], pname, inorder=0, **kw)

class tcGoogleSearchResult(TC.Struct):
    def __init__(self, pname=None, **kw):
	TC.Struct.__init__(self, Generic, [
	    TC.Boolean('documentFiltering'),
	    TC.String('searchComments', unique=1),
	    TC.Iint('estimatedTotalResultsCount'),
	    TC.Boolean('estimateIsExact'),
	    tcResultArray('resultElements'),
	    TC.String('searchQuery', unique=1),
	    TC.Iint('startIndex'),
	    TC.Iint('endIndex'),
	    TC.String('searchTips', unique=1),
	    tcDirCatArray('directoryCategories'),
	    TC.Decimal('searchTime'),
	], pname, inorder=0, **kw)

class tcGoogleSearch(TC.Struct):
    def __init__(self, pname=None, **kw):
	TC.Struct.__init__(self, Generic, [
	    TC.String('key', unique=1),
	    TC.String('q', unique=1),
	    TC.Iint('start'),
	    TC.Iint('maxResults'),
	    TC.Boolean('filter'),
	    TC.String('restrict', unique=1),
	    TC.Boolean('safeSearch'),
	    TC.String('lr', unique=1),
	    TC.String('ie', unique=1),
	    TC.String('oe', unique=1),
	], pname, inorder=0, **kw);

class Search:
    typecode = tcGoogleSearch('g:doGoogleSearch', typed=0)

    def __init__(self, query, key):
	self.key = key
	self.q = query
	self.start = 0
	self.maxResults = 10 # 10 or less! otherwise Google barfs
	self.filter = 1
	self.restrict = ''
	self.safeSearch = 0
	self.lr = ''
	self.ie = 'latin1'
	self.oe = 'latin1'

def sendsearch(request):
    conn = httplib.HTTPConnection(GoogleHost, 80)
    conn.connect()
    conn.putrequest('POST', GoogleURL)
    conn.putheader('Content-Length', '%d' % len(request))
    conn.putheader('Content-type', 'text/xml; charset="utf-8"')
    conn.putheader('SOAPAction', GoogleNS)
    conn.endheaders()
    conn.send(request)

    response = conn.getresponse()
    data = response.read()
    if debug: print "sendsearch obtained data: ", data
    for line in data.splitlines():
        match = re.compile("^\([^><]+?)<').search(snippet)
    if match:
        snippet = match.group(1)
        if debug: print "new snippet: ", snippet
    else:
        snippet = "look out! muppets!"
        if debug: print "issues!! at muppet error message"
    snippet = descape.descape(snippet)
    snippet = isolate_sentence(snippet)
    return snippet

def isolate_sentence(snippet):
    # return a grammatical sentence, or the empty string. note that "grammatical"
    # receives an extraordinarily loose definition here. it has to start with a
    # capital letter, and end with a ., ?, or !.
    if debug: print "isolating sentence from snippet: ", snippet
    match = re.compile('[A-Z][a-z, ;]+(\.|\?|\!)').search(snippet)
    if match:
        snippet = match.group(0)
        if debug: print "sentence! ", snippet
    else:
        snippet = ""
    return snippet

# google() is a generator which returns search result snippets.
# obviously this is NOT the best way to do it, but whatever.
def google(search_terms):
    if debug: print "in google"
    search_terms = massage_search(search_terms)
    if debug: print "new search terms: ", search_terms
    s = Search(search_terms, '*****Google API license goes here *********') 
    c = cStringIO.StringIO()
    SoapWriter(c,  nsdict={'g': GoogleNS}) \
	.serialize(s, oname='doGoogleSearch')
    request = c.getvalue()
    if debug: print "about to try snippet loop"

    try:
        for snippet in sendsearch(request):
            if debug: print "obtained snippet from sendsearch: ", snippet
            snippet = massage_snippet(snippet)
            if snippet:
                if debug: print "googled snippet: ", snippet
                yield snippet
    except ImportError:
        if debug: print "obtained ImportError in google()"
        pass