|
Skippy SkippyTalkBot is an AIM chatterbot. He runs in Python under Twisted and use the Google Web Services API to obtain text. Skippy's conversation is never really particularly coherent, and his design is incredibly basic, but he's fun to play with from time to time. Skippy started life as GilesGoogleBot. My original plan had been to use Google to generate only sentences by me. Unfortunately, I don't have enough suitable text on the Web. I also realized the robot was going to be kind of stupid, and I didn't want to make either Google or myself look stupid by association. In addition to being kind of an idiot, the robot can also say offensive things from time to time, in a deranged, meaningless way -- the other day it told me "I have discovered there are four different kinds of lesbians" -- so I decided giving it its own name would be less slanderous to myself and Google. More sophisticated versions of Skippy are on the way, but for now, here's a sample conversation. ------------------------------------------------ Start of SkippyTalkBot buffer: Mon Nov 07 10:35:12 2005 [10:33] dreamstooloud: Hi Skippy [10:33] SkippyTalkBot: Thanks for a great plugin! [10:33] dreamstooloud: Say hello world. [10:33] SkippyTalkBot: Korean! [10:33] dreamstooloud: Can you understand the things I say particularly well? [10:33] SkippyTalkBot: What? [10:34] dreamstooloud: I mean, can you understand me? Are you alive? [10:34] SkippyTalkBot: I say something? [10:34] dreamstooloud: Yes, you say something. Tell me if you understand me. [10:34] SkippyTalkBot: I am. [10:34] dreamstooloud: Holy shit. [10:34] SkippyTalkBot: Objects from my collection of religious kitsch. End of SkippyTalkBot buffer: Mon Nov 07 10:35:12 2005 OK, I don't know what to tell you. Sometimes that just happens. More often, though, Skippy doesn't make the tiniest bit of sense. I just woke him up fifteen seconds ago to generate some sample text, and this is what I got. WTF. Here's a more representative sample, from last week. [15:53] dreamstooloud: say hi to Kendall, Skippy
[15:53] SkippyTalkBot: Jacko.
[15:53] dreamstooloud: exactly.
[15:53] SkippyTalkBot: Technology?
[15:54] dreamstooloud: yes, Skippy! Technology! Technology is good.
We like technology.
[15:54] SkippyTalkBot: Dr.
[15:54] dreamstooloud: okay whatever
[15:54] SkippyTalkBot: I always read about the people being
executed going peacefully.
[15:54] dreamstooloud: wtf?
[15:54] SkippyTalkBot: Now?
[15:54] dreamstooloud: okay let's start over
[15:54] SkippyTalkBot: Yea!
[15:54] dreamstooloud: Skippy, this is Kendall. Say hi.
[15:54] SkippyTalkBot: Dr.
[15:55] dreamstooloud: say hi
[15:55] SkippyTalkBot: Fun.
[15:55] dreamstooloud: Kendall works for [name of company].
[15:55] SkippyTalkBot: Linux is like a wigwam.
Obviously, this really just happens in an AIM window (or actually a Trillian window), but I wanted to make it look old-school. This is what it really looks like, except the text is extremely unusual. (Don't ask me why. Some fluke of statistics.)
Actually, it kind of freaked me out. Anyway, here's the code. As should be pretty obvious, Skippy has less brains than a Pekinese. He consists of two files. One is adapted from a Twisted IM package OSCAR (AIM) demo; the other is adapted from a tutorial on using the Google Web Services API. Skippy's code is pretty much as dumb as he is, to tell the truth. I actually never really bothered to clean this up, I just wanted a proof-of-concept bridge from AIM to web apps. Skippy can very easily open URLs using either Twisted or just Python's built-in libraries, which means that you could do some pretty interesting things in terms of the interface. The idea for this came to me during a meeting at a small business where one person said they wanted office scheduling software which could run on an internal server, and another said they didn't think any of the technical users would bother to keep a browser window open just in case they needed to use that particular software. I said, well, why not just plug the software into an AIM account? Anyway: skippy.py #!/usr/bin/python
from twisted.internet import default
default.install()
from twisted.protocols import oscar
from twisted.internet import protocol, reactor
import getpass
import re
import ZSI_x
import random
SN = "SkippyTalkBot" # screenname
PASS = "******" # ghosted
hostport = ('login.oscar.aol.com', 5190)
icqMode = 0
debug = 0
class B(oscar.BOSConnection):
capabilities = [oscar.CAP_CHAT]
def initDone(self):
self.requestSelfInfo().addCallback(self.gotSelfInfo)
self.requestSSI().addCallback(self.gotBuddyList)
def gotSelfInfo(self, user):
if debug: print user.__dict__
self.name = user.name
def gotBuddyList(self, l):
if debug: print l
self.activateSSI()
self.setProfile("SkippyTalkBot is by Giles Bowkett [dreamstooloud].")
self.setIdleTime(0)
self.clientReady()
def receiveMessage(self, user, multiparts, flags):
if debug: print user.name, multiparts, flags
if debug: print "multiparts!! ", multiparts
# auto messages should not be responded to. identify them by
# the string auto, found in flags[0] (sometimes).
try:
auto = flags[0]
if auto == "auto":
return
except IndexError:
pass
self.lastUser = user.name
multiparts = self.modifyReturnMessage(multiparts)
self.sendMessage(user.name, multiparts, wantAck = 1, \
autoResponse = (self.awayMessage!=None)).addCallback( \
self.respondToMessage)
def respondToMessage(self, (username, message)):
# in the original Twisted AIM demo, this just printed out a message
# indicating that the IM had been sent. Twisted requires Deferreds, but
# writing up a new one isn't really necessary here.
if debug: print "in respondToMessage"
pass
def receiveChatInvite(self, user, message, exchange, fullName, instance, shortName, inviteTime):
pass
def extractText(self, multiparts):
# messages consist of HTML enclosing text. since a message can
# probably include different HTML for different message styles, we
# skip the HTML and pull out the text. one other thing, it looks as
# if message itself is a one-element list containing a one-element
# tuple. wtf? probably something to watch out for...
message = multiparts[0][0]
# find non-html surrounded by html; anything between > and < which
# contains neither > nor <
match = re.compile(">([^><]+?)<").search(message)
if match:
return match.group(1)
else:
return message
def modifyReturnMessage(self, multiparts):
# multiparts usually arrives as a list containing one element,
# which is a tuple containing one element. I have no idea why
# and it seems like an extraordinarily odd way to structure
# data. thus this code is highly risky, but it basically just
# creates a new message back promising to google for the text.
if debug: print "in modifyReturnMessage"
message_text = self.extractText(multiparts)
snippets = []
for snippet in ZSI_x.google(message_text):
snippets.append(snippet)
if debug: print "added snippet: ", snippet
try:
message_text = random.choice(snippets)
if debug: print "message text: ", message_text
except IndexError:
# IndexError indicates no snippets returned by snippet code.
# it shouldn't happen, but it shouldn't kill Skippy either.
pass
multiparts[0] = (message_text,)
return multiparts
class OA(oscar.OscarAuthenticator):
BOSClass = B
protocol.ClientCreator(reactor, OA, SN, PASS, icq=icqMode).connectTCP(*hostport)
reactor.run()
ZSI_x.py # code based very much on:
# http://www.xml.com/pub/a/ws/2002/06/12/soap.html
# uses ZSI SOAP package to do Google search
import socket, cStringIO, httplib, re, descape
from ZSI import *
GoogleNS = "urn:GoogleSearch"
GoogleURL = "/search/beta2"
GoogleHost = 'api.google.com'
debug = 1
class Generic:
def __init__(self, name):
self.name = name
class tcDirCatArray(TC.Array):
def __init__(self, pname=None, **kw):
TC.Array.__init__(self,
'DirectoryCategory', tcDirCat(), 'directoryCategories', **kw)
class tcSearchResult(TC.Struct):
def __init__(self, pname=None, **kw):
TC.Struct.__init__(self, Generic, [
TC.String('summary', unique=1),
TC.String('URL', unique=1),
TC.String('snippet', unique=1),
TC.String('title', unique=1),
TC.String('cachedSize', unique=1),
TC.Boolean('relatedInformationPresent'),
TC.String('hostName', unique=1),
tcDirCat('directoryCategory'),
TC.String('directoryTitle', unique=1),
], pname, inorder=0, **kw)
class tcResultArray(TC.Array):
def __init__(self, pname=None, **kw):
TC.Array.__init__(self,
'ResultElement', tcSearchResult(), 'resultElements', **kw)
class tcDirCat(TC.Struct):
def __init__(self, pname=None, **kw):
TC.Struct.__init__(self, Generic, [
TC.String('fullViewableName', unique=1),
TC.String('specialEncoding', unique=1),
], pname, inorder=0, **kw)
class tcGoogleSearchResult(TC.Struct):
def __init__(self, pname=None, **kw):
TC.Struct.__init__(self, Generic, [
TC.Boolean('documentFiltering'),
TC.String('searchComments', unique=1),
TC.Iint('estimatedTotalResultsCount'),
TC.Boolean('estimateIsExact'),
tcResultArray('resultElements'),
TC.String('searchQuery', unique=1),
TC.Iint('startIndex'),
TC.Iint('endIndex'),
TC.String('searchTips', unique=1),
tcDirCatArray('directoryCategories'),
TC.Decimal('searchTime'),
], pname, inorder=0, **kw)
class tcGoogleSearch(TC.Struct):
def __init__(self, pname=None, **kw):
TC.Struct.__init__(self, Generic, [
TC.String('key', unique=1),
TC.String('q', unique=1),
TC.Iint('start'),
TC.Iint('maxResults'),
TC.Boolean('filter'),
TC.String('restrict', unique=1),
TC.Boolean('safeSearch'),
TC.String('lr', unique=1),
TC.String('ie', unique=1),
TC.String('oe', unique=1),
], pname, inorder=0, **kw);
class Search:
typecode = tcGoogleSearch('g:doGoogleSearch', typed=0)
def __init__(self, query, key):
self.key = key
self.q = query
self.start = 0
self.maxResults = 10 # 10 or less! otherwise Google barfs
self.filter = 1
self.restrict = ''
self.safeSearch = 0
self.lr = ''
self.ie = 'latin1'
self.oe = 'latin1'
def sendsearch(request):
conn = httplib.HTTPConnection(GoogleHost, 80)
conn.connect()
conn.putrequest('POST', GoogleURL)
conn.putheader('Content-Length', '%d' % len(request))
conn.putheader('Content-type', 'text/xml; charset="utf-8"')
conn.putheader('SOAPAction', GoogleNS)
conn.endheaders()
conn.send(request)
response = conn.getresponse()
data = response.read()
if debug: print "sendsearch obtained data: ", data
for line in data.splitlines():
match = re.compile("^\
|