From ssbm2 at o2.pl Tue May 2 16:29:29 2006 From: ssbm2 at o2.pl (Szymon) Date: Tue, 02 May 2006 18:29:29 +0200 Subject: [cl-utilities-devel] split-sequence performance problem. Message-ID: <445788E9.5070802@o2.pl> Hi. I discovered that split-sequence* functions are *SLOW* when input is a list. example: (progn (test-split-list-if) (test-nsplit-list-if) (test-split-sequence-if)) SPLIT-LIST-IF, Evaluation took: 0.012 seconds of real time 1,445,888 bytes consed. NSPLIT-LIST-IF, Evaluation took: 0.008 seconds of real time 0 bytes consed. SPLIT-SEQUENCE-IF, Evaluation took: 124.586 seconds of real time 1,527,328 bytes consed. example's code: (defun split-list-if (test list &aux (start list) (end list)) (loop while (and end (setq start (member-if-not test end))) collect (ldiff start (setq end (member-if test start))))) (defun nsplit-list-if (test list &aux (list (member-if-not test list)) result tail) (flet ((helper (list) (loop for i on list for j = (cdr i) when (funcall test (car j)) do (if (cdr j) (return-from helper (values i j)) (progn (rplacd i nil) (return-from helper (values list nil))))) (values list nil))) (multiple-value-bind (a b) (helper1 list) (unless b (return-from nsplit-list-if list)) (rplacd a nil) (setq result (setq list (rplaca b list))) (setq list (cdr result))) (rplacd result nil) (setq tail result) (loop (setq list (member-if-not test list)) (multiple-value-bind (a b) (helper1 list) (cond ((null b) (when a (rplacd tail (list a))) (return-from nsplit-list-if result)) (t (rplacd a nil) (rplacd tail (setq list (rplaca b list))) (setq list (cdr list)) (rplacd (setq tail (cdr tail)) nil))))))) (defun white-space-p (char) (member char '(#\Space #\Newline #\Return #\Tab #\Page))) (defvar *test-result*) (defvar *test-data*) (defun load-test-data () (setq *test-data* (with-open-file ;; http://www.gutenberg.org/dirs/etext91/alice30.txt (stream #P"alice30.txt") (loop with char while (setq char (read-char stream nil nil)) collect char)))) (defun test-nsplit-list-if () (load-test-data) (gc) (time (setq *test-result* (nsplit-list-if #'white-space-p *test-data*))) (values)) (defun test-split-list-if () (load-test-data) (gc) (time (setq *test-result* (split-list-if #'white-space-p *test-data*))) (values)) (defun test-split-sequence-if () (load-test-data) (gc) (time (setq *test-result* (cl-utilities:split-sequence-if #'white-space-p *test-data* :remove-empty-subseqs t))) (values)) Regards, Szymon. From sketerpot at gmail.com Tue May 2 20:48:55 2006 From: sketerpot at gmail.com (Peter Scott) Date: Tue, 2 May 2006 15:48:55 -0500 Subject: [cl-utilities-devel] split-sequence performance problem. In-Reply-To: <445788E9.5070802@o2.pl> References: <445788E9.5070802@o2.pl> Message-ID: <7e267a920605021348u56cc7378j5b1e03057ab3c5ac@mail.gmail.com> On 5/2/06, Szymon wrote: > Hi. I discovered that split-sequence* functions are *SLOW* > when input is a list. > [snip example] Ouch, you're right. I looked at the code, and it was pretty obviously designed for vectors, with lists handled as an afterthought. If this is an issue for you, it might be worth the trouble to add more efficient functions for lists. If it's not an issue but just a glaring performance wart, it might be better to leave the code as it is, simply because it's well tested and debugged in its current state and stability is very important to me. In any case, looking at this some more has now been added to my to-do list. Thanks for pointing this out. -Peter From ssbm2 at o2.pl Fri May 5 21:51:59 2006 From: ssbm2 at o2.pl (Szymon) Date: Fri, 05 May 2006 23:51:59 +0200 Subject: [cl-utilities-devel] split-sequence performance problem. In-Reply-To: <7e267a920605021348u56cc7378j5b1e03057ab3c5ac@mail.gmail.com> References: <445788E9.5070802@o2.pl> <7e267a920605021348u56cc7378j5b1e03057ab3c5ac@mail.gmail.com> Message-ID: <445BC8FF.7010109@o2.pl> Peter Scott wrote: > [.....] If it's not an issue but just a glaring > performance wart, it might be better to leave the code as it is, > simply because it's well tested and debugged in its current state and > stability is very important to me. [.....] Leave the code as it is, I just wrote an utility for splitting lists and it's ok for me. Regards, Szymon. ps. utility works like this: CL-USER> (split-list-if #'zerop '(0 0)) NIL CL-USER> (split-list-if #'zerop '(0 0) :preserve-delimiters t) ((0 0)) (split-list-if #'zerop '(0 0 0 x) :preserve-delimiters t) CL-USER> (split-list-if #'null '(a nil b)) ((A) (B)) CL-USER> (split-list-if #'null '(nil a nil nil b nil)) ((A) (B)) CL-USER> (split-list-if #'null '(nil a nil nil b nil) :preserve-delimiters t) ((NIL) (A) (NIL NIL) (B) (NIL)) CL-USER> (split-list-if #'null '(nil a nil nil b nil) :count 2) ((A) (B)) CL-USER> (split-list-if #'null '(nil a nil nil b nil) :preserve-delimiters t :count 2) ((NIL) (A)) CL-USER> (split-list-if #'numberp '(0 a 1 2 b 3 4 c d)) ((A) (B) (C D)) CL-USER> (split-list-if #'numberp '(0 a 1 2 b 3 4 c d) :preserve-delimiters t) ((0) (A) (1 2) (B) (3 4) (C D)) CL-USER> (split-list-if #'symbolp '(0 a 1 2 b 3 4 c d) :preserve-delimiters t) ((0) (A) (1 2) (B) (3 4) (C D)) CL-USER> (split-list-if #'numberp '(foo (0 1) bar (2 3) baz 4 mug 5) :key (lambda (x) (if (consp x) (car x) x))) ((FOO) (BAR) (BAZ) (MUG)) CL-USER> (split-list-if #'numberp '(foo (0 1) bar (2 3) baz 4 mug 5) :key (lambda (x) (if (consp x) (car x) x)) :preserve-delimiters t) ((FOO) ((0 1)) (BAR) ((2 3)) (BAZ) (4) (MUG) (5)) CL-USER> (split-list-if #'numberp '(0 a 1 2 b 3 4 c d 0 0 x) :preserve-delimiters t :count 3 :from-end t) ((C D) (0 0) (X)) CL-USER> (split-list-if #'numberp '(0 a 1 2 b 3 4 c d 0 0 x) :count 3 :from-end t) ((B) (C D) (X)) |# (defun split-list-if (test list &key preserve-delimiters key count from-end &aux (ldiff/cons (if (and from-end count) #'cons #'ldiff))) (when (or (null list) (and count (zerop count))) (return-from split-list-if)) (when (and from-end (not count)) (setq from-end nil)) (multiple-value-bind (member member-not) (values (lambda (list) (member-if test list :key key)) (let ((test-not (complement test))) (lambda (list) (member-if test-not list :key key)))) (let ((get-next (if preserve-delimiters (let ((f member)) (lambda () (let ((result-begin list) (result-end (funcall f list))) (setq f (if (eq f member) member-not member)) (setq list result-end) (when result-begin (funcall ldiff/cons result-begin result-end))))) (lambda (&aux (start (funcall member-not list)) (tail (funcall member start))) (when start (funcall ldiff/cons start (setq list tail))))))) (let (result pointer next init-delims) (setq init-delims (let ((tail (funcall member-not list))) (cond ((and (null tail) (cdr list)) (prog1 (copy-list list) (setq list nil))) (t (prog1 (ldiff list tail) (setq list tail)))))) (if preserve-delimiters (when (and init-delims (or (and (not from-end) count (= count 1)) (null list))) (return-from split-list-if (list init-delims))) (unless list (return-from split-list-if))) (setq result (list (funcall get-next)) pointer result) (when (and init-delims preserve-delimiters) (setq result (nconc (list init-delims) result)) (when count (decf count))) (if count (loop repeat (1- count) while (setq next (funcall get-next)) do (setq pointer (cdr (rplacd pointer (list next))))) (loop while (setq next (funcall get-next)) do (setq pointer (cdr (rplacd pointer (list next)))))) (when (and count from-end) (when list (cond ((= count 1) (loop for x = (funcall get-next) do (if x (setq next x) (return))) (setq result (rplaca result next))) (t (loop while (setq next (funcall get-next)) for cell = (prog1 result (setq result (cdr result))) do (rplaca (setq pointer (cdr (rplacd pointer cell))) next)) (rplacd pointer nil)))) (map-into result (lambda (cons) (ldiff (car cons) (cdr cons))) result)) result)))) From ssbm2 at o2.pl Sat May 6 10:22:47 2006 From: ssbm2 at o2.pl (Szymon) Date: Sat, 06 May 2006 12:22:47 +0200 Subject: [cl-utilities-devel] split-sequence performance problem. In-Reply-To: <445BC8FF.7010109@o2.pl> References: <445788E9.5070802@o2.pl> <7e267a920605021348u56cc7378j5b1e03057ab3c5ac@mail.gmail.com> <445BC8FF.7010109@o2.pl> Message-ID: <445C78F7.8060201@o2.pl> Yesterday I posted buggy code, below there is new wersion. I hope it's both fast and memory economical (it don't do unnecesary consing with :FROM-END & :COUNT). (defun split-list-if (test list &key preserve-delimiters key count from-end &aux (test-not (complement test))) (when (or (null list) (and count (zerop count))) (return-from split-list-if)) (when (and from-end (not count)) (setq from-end nil)) (let* ((member (lambda (list test copy?) (do ((i list (cdr i)) (r '() (when copy? (cons (car i) r)))) ((or (endp i) (funcall test (if key (funcall key (car i)) (car i)))) (values i (when copy? (if from-end (when list (cons list i)) (nreverse r)))))))) (get-next (if preserve-delimiters (let ((%test test)) (lambda () (multiple-value-bind (rest result) (funcall member list %test t) (setq %test (if (eq %test test) test-not test)) (setq list rest) result))) (lambda () (multiple-value-bind (rest result) (funcall member list test t) (when (setq list rest) (setq list (funcall member list test-not nil))) result))))) (let (result pointer next init-delims) (setq init-delims (let ((tail (member-if-not test list :key key))) (cond ((and (null tail) (cdr list)) (prog1 (copy-list list) (setq list nil))) (t (prog1 (ldiff list tail) (setq list tail)))))) (if preserve-delimiters (when (and init-delims (or (and (not from-end) count (= count 1)) (null list))) (return-from split-list-if (list init-delims))) (unless list (return-from split-list-if))) (setq result (list (if (and preserve-delimiters init-delims) (if from-end (cons init-delims list) init-delims) (funcall get-next))) pointer result) (if count (loop repeat (1- count) while (setq next (funcall get-next)) do (setq pointer (cdr (rplacd pointer (list next))))) (loop while (setq next (funcall get-next)) do (setq pointer (cdr (rplacd pointer (list next)))))) (when (and count from-end) (when list (cond ((= count 1) (loop for x = (funcall get-next) do (if x (setq next x) (return))) (setq result (rplaca result next))) (t (loop while (setq next (funcall get-next)) for cell = (prog1 result (setq result (cdr result))) do (rplaca (setq pointer (cdr (rplacd pointer cell))) next)) (rplacd pointer nil)))) (map-into result (lambda (cons) (ldiff (car cons) (cdr cons))) result)) result))) Regards, Szymon. From ssbm2 at o2.pl Sat May 6 10:37:48 2006 From: ssbm2 at o2.pl (Szymon) Date: Sat, 06 May 2006 12:37:48 +0200 Subject: [cl-utilities-devel] split-sequence performance problem. In-Reply-To: <445C78F7.8060201@o2.pl> References: <445788E9.5070802@o2.pl> <7e267a920605021348u56cc7378j5b1e03057ab3c5ac@mail.gmail.com> <445BC8FF.7010109@o2.pl> <445C78F7.8060201@o2.pl> Message-ID: <445C7C7C.2020606@o2.pl> fix (I hope last one): line 11 should be: (r '() (and (not from-end) copy? (cons (car i) r)))) Regards, Szymon.