[cffi-devel] Strange Performance Issue

Luís Oliveira luismbo at gmail.com
Mon Sep 3 16:16:12 UTC 2007


On 03/09/07, Mikael Lax <mikael.lax at bredband.net> wrote:
> I was working on a cl-opengl app recently and I noticed that my textures
> were taking much longer to load than I could remember.

> (defun test-1 ()
[...]
>            (setf (cffi:mem-aref data 'cl-opengl-bindings:ubyte i) (aref array i))))))
>
> (defun test-2 ()
[...]
>            (setf (cffi:mem-aref data type i) (aref array i))))))

So the problem is that in TEST-2 the type argument to MEM-AREF is not
known at compile-time. In TEST-1 that call to mem-aref is compiled
down to a fewer assembler instructions.

In TEST-2 however, on each iteration you have at least 3 generic
function calls, 4 regular function calls (that probably call more of
their own) including one to PARSE-TYPE that needs to access an
hash-table to retreive the parser closure. UBYTE is defined through
DEFCTYPE, which memoizes the type instance, at least we're not
instantiating a new type object every time.


> * (time (test-1))
>   0.027 seconds of real time
[...]
>   3,145,736 bytes consed.
[..]
> * (time (test-2))
> Evaluation took:
>   9.565 seconds of real time
[...]
>   688,918,576 bytes consed.

I didn't do any profiling but I guess that extra overhead I described
explains the time difference. I'm not sure what's consing 700M though.

So, if you can make your type constant, do it. If not, you can still
move the majority of that overhead out of the inner loop:

(defun test-3 ()
  (let* ((array (make-array (* 512 512 3) :initial-element 0))
         (count (length array))
         (type :unsigned-char))
    (with-foreign-object (data type count)
      (let ((ctype (cffi::canonicalize (cffi::parse-type type)))
            (size (foreign-type-size type)))
        (loop for i below count do
              (cffi::%mem-set (aref array i) data ctype (* i size)))))))

CFFI> (time (test-3))
Evaluation took:
  0.356 seconds of real time
  0.121568 seconds of user run time
  0.065652 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  9,437,320 bytes consed.

For comparision, these are the other timings in my laptop:

CFFI> (time (test-1))
Evaluation took:
  0.162 seconds of real time
  0.068727 seconds of user run time
  0.008378 seconds of system run time
  0 calls to %EVAL
  0 page faults and
  3,145,736 bytes consed.

CFFI> (time (test-2))
Evaluation took:
  34.538 seconds of real time
  12.495962 seconds of user run time
  3.882277 seconds of system run time
  [Run times include 1.007 seconds GC run time.]
  0 calls to %EVAL
  0 page faults and
  701,490,664 bytes consed.

So, if there's a need for this, we might want to export things like
PARSE-TYPE and CANONICALIZE (with better names) and have the various
operators accept parsed types.

HTH,

-- 
Luís Oliveira
http://student.dei.uc.pt/~lmoliv/



More information about the cffi-devel mailing list