Advanced usage: The TwitterSearchOrder class

This class mainly acts as a plain container for configuration all parameters currently available by the Twitter Search API. There are several parameters which can easily be set and modified by methods in TwitterSearchOrder.

The only parameter with a default value is count with 100. This is because it is the maximum of tweets returned by this very Twitter API endpoint. In most cases you’d like to reduce traffic and the amount of queries, so it makes sense to set the biggest possible value by default. Please note that this endpoint has a different maximum size than the one used in TwitterUserOrder.

Be aware that some parameters can be ignored by Twitter. For example currently not every language is detectable by the Search API. TwitterSearch is only responsible for transmitting values according to the Twitter documentation.

API Parameter Type Modifying methods Example
q *required* add_keyword(<string>), add_keyword(<list>, or_operator=True), set_keywords(<list>) add_keyword('#Hashtag'), set_keywords(['foo','bar'])
q optional set_link_filter(), remove_link_filter() set_link_filter(), remove_link_filter()
q optional set_question_filter(), remove_question_filter) set_question_filter(), remove_question_filter)
q optional set_source_filter(<string>), remove_source_filter) set_source_filter('twitterfeed')
q optional set_positive_attitude_filter(), set_negative_attitude_filter, remove_attitude_filter) set_positive_attitude_filter()
geocode optional set_geocode(latitude<float>, longitude<float>, radius<int,long>, imperial_metric=<True,False> set_geocode(52.5233,13.4127,10,imperial_metric=True)
lang optional set_language(<ISO-6391-string>) set_language('en')
locale optional set_locale(<ISO-6391-string>) set_locale('ja')
result_type optional set_result_type(<mixed,recent,popular>) set_result_type('recent')
count optional set_count(<int>[Range: 1-100]) set_count(42)
until optional set_until(<datetime.date>) set_until(datetime.date(2012, 12, 24))
since optional set_since(<datetime.date>) set_since(datetime.date(2012, 12, 24))
since_id optional set_since_id(<int,long>[Range: >0]) set_since_id(250075927172759552)
max_id optional set_max_id(<int,long>[Range: >0]) set_max_id(249292149810667520)
include_entities optional set_include_etities(<bool,int>) set_include_etities(True), set_include_etities(1)
callback optional set_callback(<string>) set_callback('myMethod')

If you’re not familiar with the meaning of the parameters, please have a look at the Twitter Search API documentation. Most parameter are self-describing anyway.

Advanced filtering

There are certain types of filters the Twitter Search API can handle. All of them are listed in the Twitter documentation. Unfortunately such advanced filters can be quite messy. Read this chapter if you’d like to use advanced queries of the Twitter API. All filters not based on keyword can be removed by using TwitterSearchOrder.remove_all_filters().

Keywords with spaces

Twitter does search for keywords separately by default. This means looking for James Bond will return tweets containing the words James and Bond like in There was a bond of friendship between James and Amy. However, sometimes you might like to look for reviews of the newest James Bond movie and therefore such tweets won’t be much of a help for you. In this case make sure to add a keyword surrounded by ". Twitter will take such keywords as one phrase resulting in tweets actually containing James Bond like in My name is Bond ... James Bond.

TwitterSearch will handle those issues for you by default as calling add_keyword("James Bond") will look for "James Bond" as one full phrase while add_keyword(["James","Bond"]) will result in a search for James AND Bond.

Excepting keywords

Sometimes you might like to look for tweets containing specific words but not containing a different one. It’s easy to except a certain keyword by applying a dash prefix (-) to it. Thus, a line like set_keywords(['Porsche', '-Fiat']) will give you all tweets containing the word Porsche but only if there is no Fiat in it.

OR concatenating of keywords

It is possible to search for foo OR bar with TwitterSearch. Instead of simply adding those keywords like we did in the basic usage chapter, you can use the or_operator=True parameter. It’s also possible to concatenate different keyword:

from TwitterSearch import *

try:
    tso = TwitterSearchOrder()
    tso.set_keywords(['Goofy', 'Nyancat'], or_operator = True)
    tso.add_keyword('BMW')

    ts = TwitterSearch(
            consumer_key = 'aaabbb',
            consumer_secret = 'cccddd',
            access_token = '111222',
            access_token_secret = '333444'
        )

    for tweet in ts.search_tweets_iterable(tso):
        print('@%s tweeted: %s' % (tweet['user']['screen_name'], tweet['text']))

    except TwitterSearchException as e:
        print(e)

Concatenating several keywords can be tricky as the syntax of the Twitter Search API is pretty undocumented and only roughly defined. In my tests it turned out at a query like Goofy OR Nycancat BMW seemed to be the very same as (Goofy OR Nycancat) AND BMW although there is nothing mentioned in the documentation about concatenations of keywords. If you’d like to make sure your combination works, better use the official Twitter Search to perform some tests and see whether Twitter handles your query correctly.

Tweets of/from/mentioning a certain user

In this example we’ll use the twitter account of Eric Jarosinski and his twitter user Nein Quarterly.

It’s also possible to search for tweets of a certain user. You’d better use TwitterUserOrder for this as this actually queries the timeline of the user instead of using the Twitter Search API. Nonetheless, it’s also possible to do through TwitterSearchOrder. Just add the prefix of from: to the username. Using standard TwitterSearch methods this would look like add_keyword("from:neinquarterly").

Tweets directly to a user can be collected using the to: prefix in front of the username. Due to this tweets to neinquarterly can be collected using add_keyword("to:neinquarterly").

If you’d like to receive tweets referencing a certain user you are able to gather them by using a @ prefix in front of the username. Thus, the corresponding code snipped is add_keyword("@neinquarterly").

Tweets containing a question

It’s also possible to receive only tweets that are asking a question. You can do so by setting the filter via TwitterSearchOrder.set_question_filter(). A removal of this filter can be done with TwitterSearchOrder.remove_question_filter(). Be aware that this filtering is done by Twitter and it doesn’t necessary work well as it might miss questions in certain languages.

Attitude filtering

Twitter also offers an attitude-based filtering mechanism. You can search for positive tweets by using TwitterSearchOrder.set_positive_attitude_filter() and for negative ones by using TwitterSearchOrder.set_negative_attitude_filter(). The attitude filtering can be removed using TwitterSearchOrder.remove_attitude_filter(). Note that this filter mechanism is performed by Twitter directly and you may miss tweets not detected by those. This especially holds true for tweets not authored in English.

Source filtering

If you’re interested in tweets only submitted using a specific software you can do so using the method TwitterSearchOrder.set_source_filter(<string>). Calling set_source_filter("twitterfeed") gives you only tweets submitted using TwitterFeed. The removal of this filter can be performed through TwitterSearchOrder.remove_source_filter().

Time-based filtering

TwitterSearch tries to concentrate on simple query and does prefer to submit arguments as parameters instead of merging them into the query string. Thus TwitterSearch will generate raw query strings like ?q=foobar&until=2010-12-27 instead of ?q=foobar+since:2010-12-27. Both versions will return the very same tweets but while the first one separates the values in different parameters, the second one just merges everything together. Doing so is likely to lead to long and possibly wrong query strings. Remember that you’re perfectly able to submit stuff like ?q=foobar+since:2010-12-27+until:2010-12-26 which is obviously non-sense. If you would still like to dump everything into the q parameter you can do so manually by using set_keywords(['since:2010-12-27','until:2010-12-26']) for example.

If you have no specific reason to actually include those time-based filters into the search query parameter directly, you should use the default methods of set_since_id(), set_since() and/or set_until().

Advanced usage examples

You may want to use TwitterSearchOrder for just generating a valid Twitter Search API query string containing all your arguments without knowing too much details about the Twitter API? No problem at all as there is the method TwitterSearchOrder.createSearchURL(). It creates and returns an valid Twitter Search API query string. Afterwards the last created string is also available through TwitterSearchOrder.url.

from TwitterSearch import TwitterSearchOrder, TwitterSearchException

try:
    tso = TwitterSearchOrder()
    tso.set_language('nl')
    tso.set_locale('ja')
    tso.set_keywords(['One','Two'])
    tso.add_keyword('myKeyword')

    print(tso.create_search_url())

except TwitterSearchException as e:
      print(e)

You’ll receive ?q=One+Two+myKeyword&count=100&lang=nl&locale=ja as result. Now you are free to use this string for manually querying Twitter (or any other API using the same parameter as Twitter does).

Maybe you would like to create another TwitterSearchOrder instance with a slightly different URL.

from TwitterSearch import TwitterSearchOrder, TwitterSearchException

try:
    tso = TwitterSearchOrder()
    tso.set_language('nl')
    tso.set_locale('ja')
    tso.set_keywords(['One','Two'])
    tso.add_keyword('myKeyword')

    querystr = tso.create_search_url()

    # create a new TwitterSearchOrder based on the old query string and work with it
    tso2 = TwitterSearchOrder()
    tso2.set_search_url(querystr + '&result_type=mixed&include_entities=true')
    tso2.set_locale('en')
    print(tso2.create_search_url())

except TwitterSearchException as e:
   print(e)

This piece of code will finally result in an output of ?q=One+Two+myKeyword&count=100&lang=nl&locale=en&result_type=mixed&include_entities=true.

Please be aware that the sense of arguments given by set_search_url() is not checked. Due to this it is perfectly valid to to stuff like set_search_url('q=Not+my+department&count=1731&locale=Canada&foo=bar'). When manually setting the string, the leading ? sign is optional.

Such stuff doesn’t make much sense when querying Twitter. However, there may be cases when you’re using TwitterSearch is some exotic context where this behavior is needed to avoid the regular checks of the TwitterSearchOrder methods.

Be aware that if you’re using set_search_url() all previous configured parameters are lost.