Using boto for DynamoDB, normally you would consume the data with a for loop:
from boto.dynamodb2.table import Table dataTable = Table(...) # scan or query_2 data = dataTable.scan(....) # consume the data for item in data: # do something here pass
Doing it this way, you might easily exceed the provisioned throughput if the table is relatively large. You might simply increase the throughput (and pay more), or delaying the code so it doesn’t send too many requests in a short period of time.
Writing code to delay the query pace is a little bit tricky because it is difficult to apply try/catch on the for statement. Here is another way to do it:
from boto.dynamodb2.table import Table import time dataTable = Table(...) # scan and query_2 return a ResultSet which is iterable... data = dataTable.scan(....) # consume the data retries = 0 while 1: try: # use next() to iterate over the ResultSet item = next(data) except dynamodb2.exceptions.ProvisionedThroughputExceededException: sleepTime = min(60, (2.**retries)/10.) print 'Sleeping for %.02f secs' % sleepTime time.sleep(sleepTime) item = None retries += 1 if retries < 10 else 0 except StopIteration: # run out of elements break if item is None: continue # do something with your item pass
The idea is that we apply the built-in function next() on the ResultSet returned by DynamoDB API, and wrap that call inside a try/except statement. It might become a bit more involved if you want to write some multi-threaded pieces of code to fetch the data, however the above code should be enough for interactive or exploratory work.
Update: the above piece of code can be re-written as a wrapper function like this:
from boto.dynamodb2.table import Table from boto import dynamodb2 import time def getQueryResults(sTableName, sRegion = 'ap-southeast-1', maxRetries = 20, **kwargs): ''' Run a query (query_2 function) on a dynamodb2 table and return the results "safely" ''' assert len(kwargs) > 0, 'Please specify some filter for querying' table = Table(sTableName, connection=dynamodb2.connect_to_region(sRegion)) data = table.query_2(**kwargs) retries = 0 while 1: try: yield next(data) except dynamodb2.exceptions.ProvisionedThroughputExceededException as e: sleepTime = (2.**min(10, retries))/10. time.sleep(sleepTime) retries += 1 if retries > maxRetries: raise e except StopIteration: break