Categories
Planet Python PyMT

Python: is X is better than Y ? Round 3, pure python VS cython

Round 3: pure python drawing VS cython

Benchmark code :

s = time.time()
for x in xrange(10000):
   drawRectangle(0, 0, 50, 50)
print 'result=', time.time() - s

Let’s take the Python version of simplified drawRectangle():

def drawRectangle(x, y, w, h):
   glBegin(GL_QUADS)
   glVertex2f(x, y)
   glVertex2f(x + w, y)
   glVertex2f(x + w, y + h)
   glVertex2f(x, y + h)
   glEnd()

I got the result on poor graphics card: average ~1.1268s

Let’s rewrite in Cython:

cdef extern from "GL/gl.h":
   ctypedef float         GLfloat
   ctypedef unsigned int  GLenum
   int GL_QUADS
   cdef void glBegin(GLenum mode)
   cdef void glEnd()
   cdef void glVertex2f(GLfloat x, GLfloat y)
def drawRectangle(float x, float y, float w, float h):
   glBegin(GL_QUADS)
   glVertex2f(x, y)
   glVertex2f(x + w, y)
   glVertex2f(x + w, y + h)
   glVertex2f(x, y + h)
   glEnd()

And the result : average ~0.0325s

PyMT Impact: rewriting graphx package 🙂

EDIT:
On a NVIDIA 9800 GT: with 1000000 (instead of 10000):

  • Python: 16.1
  • Python -O: 17.2
  • Python -OO: 16.9
  • Cython: 0.29

Very weird about -O / -OO…

Categories
Planet Python PyMT

Python: is X is better than Y ? Round 2, remove exception VS test in + remove

Round 2: catching remove exception VS test in + remove

Benchmark code :

from time import time

# exceptions vs list in
def bench_in_remove(count):
    for x in xrange(count):
        q = []
        for z in xrange(1000):
            q.append(z)
        for z in xrange(0, 500):
            if z in q:
                q.remove(z)
        
def bench_exception(count):
    for x in xrange(count):
        q = []
        for z in xrange(1000):
            q.append(z)
        for z in xrange(0, 500):
            try:
                q.remove(z)
            except ValueError:
                continue

def bench_remove(count):
    for x in xrange(count):
        q = []
        for z in xrange(1000):
            q.append(z)
        for z in xrange(0, 500):
            q.remove(z)

def run(f, c=10000):
    t = time()
    f(c)
    print '%-15s time=%.8f count=%d' % (f.func_name, time() - t, c)

run(bench_remove)
run(bench_in_remove)
run(bench_exception)

And the result :

bench_remove    time=4.20256305 count=10000
bench_in_remove time=4.49746203 count=10000
bench_exception time=4.33078504 count=10000

We can see a little improvement with testing exception instead of testing in.
We can also see the overhead due to the test before removing in list.

I’ve also tested with xrange(-500, 500) instead of (0, 500), to trigger invalid removal. Here is the result :

bench_in_remove time=12.42448401 count=1000
bench_exception time=14.26106405 count=1000

Triggering an exception cost much time than testing if value is in a list…

PyMT Impact: must check.

Categories
Planet Python PyMT

Python: is X is better than Y ? Round 1, deque vs list.

This week-end, i’ve spend some time about searching how to optimize PyMT code. And done some interesting benchmark. Next days, i’ll post some of them, in order to remember which is the better solution.

For the first round: let’s test Deque from collections package VS Python List !

Code:

from time import time

# collections VS list
def bench_deque(count):
    from collections import deque
    q = deque()
    for x in xrange(count):
        for z in xrange(1000):
            q.append(z)
        while True:
            try:
                q.pop()
            except:
                break

def bench_list(count):
    q = []
    for x in xrange(count):
        for z in xrange(1000):
            q.append(z)
        for y in xrange(len(q)):
            q.pop(0)

def run(f, c=100000):
    t = time()
    f(c)
    print '%-15s time=%.8f count=%d' % (f.func_name, time() - t, c)

run(bench_deque)
run(bench_list)

And the result :

bench_deque     time=35.61677003 count=100000
bench_list      time=78.20481586 count=100000

Impact for PyMT: wm_pen, wm_touch.

Categories
Planet Python

Python and HTTP Pipelining

Attention, this method is NOT pipelining as described in comments, and even might break if the http connection is too fast with httplib.ResponseNotReady. I’ll update this post when i’ll found a real and simple way to achieve pipelining, because one possible way to do it with httplib is really ugly

I wanted to do HTTP Pipelining using urllib2. But, first of all, what is pipelining ?
HTTP pipelining is a technique in which multiple HTTP requests are written out to a single socket without waiting for the corresponding responses.

What is the benefit of pipelining ? Less network load, speedup processing !
I was searching a way to do it with urllib2… But solution are complicated, and not fit well to my needs.
But way, why stay on urllib2 ? Use httplib !

Reusing the connection :

First, reusing the same connection

import httplib
server = httplib.HTTPConnection('yourserver.com')
server.request('GET', '/index.html')
print 'RESPONSE1:', server.getresponse().read()

server.request('GET', '/index2.html')
print 'RESPONSE2:', server.getresponse().read()

server.request('GET', '/index3.html')
print 'RESPONSE3:', server.getresponse().read()

Second, try pipelining !

import httplib
server = httplib.HTTPConnection('yourserver.com')
server.request('GET', '/index.html')
res1 = server.getresponse()
server.request('GET', '/index2.html')
res2 = server.getresponse()
server.request('GET', '/index3.html')
res3 = server.getresponse()

print 'RESPONSE1:', res1.read()
print 'RESPONSE2:', res2.read()
print 'RESPONSE3:', res3.read()
Categories
Planet Python PyMT

Color swapping and Python

For PyMT, Sharath need BGR support into PyMT, while his graphic card don’t support GL_BGR.
Well, after adapting swap code from Pyglet sourcecode, he say: “< patali> it works but very slow”

The goal is to swap a string from ‘bgrbgrbgr’ to ‘rgbrgbrgb’. Just let’s do a rapid benchmark.

import sys
import re
import time

swap1_pattern = re.compile('(.)(.)(.)', re.DOTALL)
swap1_repl = r'\3\2\1'
def swap1(bytes):
    return swap1_pattern.sub(swap1_repl, bytes)

def swap2(bytes):
    out = ''
    for i in xrange(0, len(bytes), 3):
        b, g, r = bytes[i:i+3]
        out += r
        out += g
        out += b
    return out

def swap3(bytes):
    bytes = list(bytes)
    for i in xrange(0, len(bytes), 3):
        b, g, r = bytes[i:i+3]
        bytes[i:i+3] = r, g, b
    return ''.join(bytes)

def swap4(bytes):
    blues = bytes[0::3]
    greens = bytes[1::3]
    reds = bytes[2::3]
    return ''.join(''.join(x) for x in zip(reds, greens, blues))

def swap5(bytes):
    from array import array
    a = array('b', bytes)
    a[0::3], a[2::3] = a[2::3], a[0::3]
    return a.tostring()

def swap6(bytes):
    import numpy
    a = numpy.array(bytes, 'c')
    b = a[...,::-1]
    return b.tostring()

def swap7(bytes):
    a = list(bytes)
    a[0::3], a[2::3] = a[2::3], a[0::3]
    return ''.join(a)


def bench(func, bytes):
    sys.stderr.write('Bench %s: ' % str(func))

    start = time.time()
    for i in xrange(20):
        sys.stderr.write('.')
        ret = func(bytes)
        if ret[:3] != 'rgb' or ret[-3:] != 'rgb':
            sys.stderr.write('INVALID DATA, start=%s, end=%s' %
                            (ret[:3], ret[-3:]))
            break

    end = time.time() - start
    sys.stderr.write('| Finished in %.4f
' % end)
    return ret


bytes = 'bgr' * 256 * 256
bench(swap1, bytes)
bench(swap2, bytes)
bench(swap3, bytes)
bench(swap4, bytes)
bench(swap5, bytes)
bench(swap6, bytes)
bench(swap7, bytes)

Swap1() is actually the way of pyglet. swap2/swap3 is from me, and others is found on some threads in pygame mailing list.

So, who win ?

21:09 tito@ashaka ~ $ python benchmark.py 
Bench <function swap1 at 0x7f1d154632a8>: ....................| Finished in 3.2713
Bench <function swap2 at 0x7f1d15463230>: ....................| Finished in 1.2860
Bench <function swap3 at 0x7f1d15470cf8>: ....................| Finished in 0.6535
Bench <function swap4 at 0x7f1d15470e60>: ....................| Finished in 0.6475
Bench <function swap5 at 0x7f1d15470ed8>: ....................| Finished in 0.0367
Bench <function swap6 at 0x7f1d15470f50>: ....................| Finished in 0.1959
Bench <function swap7 at 0x7f1d15473050>: ....................| Finished in 0.1482

The array solution is a lot faster than any other implementation ! And it’s python standard 🙂 Let’s take this…

Edit: thanks for Milan to come with another awesome solution !

def swap0(bytes):
  return bytes[::-1]

Result is: 0.0059 !