How to handle ProvisionedThroughputExceededException in DynamoDB

Using boto for DynamoDB, normally you would consume the data with a for loop:


from boto.dynamodb2.table import Table

dataTable = Table(...)

# scan or query_2
data = dataTable.scan(....)

# consume the data
for item in data:
    # do something here
    pass

Doing it this way, you might easily exceed the provisioned throughput if the table is relatively large. You might simply increase the throughput (and pay more), or delaying the code so it doesn’t send too many requests in a short period of time.

Writing code to delay the query pace is a little bit tricky because it is difficult to apply try/catch on the for statement. Here is another way to do it:


from boto.dynamodb2.table import Table
import time

dataTable = Table(...)

# scan and query_2 return a ResultSet which is iterable...
data = dataTable.scan(....)

# consume the data
retries = 0

while 1:
    try:
        # use next() to iterate over the ResultSet
        item = next(data)
    except dynamodb2.exceptions.ProvisionedThroughputExceededException:
        sleepTime = min(60, (2.**retries)/10.)
        print 'Sleeping for %.02f secs' % sleepTime
        time.sleep(sleepTime)
        item = None
        retries += 1 if retries < 10 else 0
    except StopIteration:
        # run out of elements
        break

    if item is None:
        continue

    # do something with your item
    pass

The idea is that we apply the built-in function next() on the ResultSet returned by DynamoDB API, and wrap that call inside a try/except statement. It might become a bit more involved if you want to write some multi-threaded pieces of code to fetch the data, however the above code should be enough for interactive or exploratory work.

Update: the above piece of code can be re-written as a wrapper function like this:

from boto.dynamodb2.table import Table
from boto import dynamodb2
import time

def getQueryResults(sTableName, sRegion = 'ap-southeast-1', maxRetries = 20, **kwargs):
    '''
    Run a query (query_2 function) on a dynamodb2 table and return the results "safely"
    '''

    assert len(kwargs) > 0, 'Please specify some filter for querying'
    table = Table(sTableName, connection=dynamodb2.connect_to_region(sRegion))
    data = table.query_2(**kwargs)
    retries = 0

    while 1:
        try:
            yield next(data)
        except dynamodb2.exceptions.ProvisionedThroughputExceededException as e:
        sleepTime = (2.**min(10, retries))/10.
        time.sleep(sleepTime)
        retries += 1
        if retries > maxRetries:
            raise e
        except StopIteration:
            break
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s