[elephant-devel] Collection paging

lists at infoway.net lists at infoway.net
Thu Oct 4 03:07:17 UTC 2007


I have to say that Mariano is hitting some of the issues we will be  
facing soon as our quest to learn Lisp and Elephant continues and we  
continue working on migrating some of our SQL-based applications  
over. This particular need of his is also a real need we have since  
it's something we offer to our application users. For example, in the  
application we are working on migrating, we have a table with over 7  
million rows. This obviously has many thousands of 50-row pages to  
navigate through. Our user interface offers the "usual" search, sort  
by any column(s), and page navigation (First, Last, Next, Previous,  
or manually input a page number).

The way we handle this in our code is something like SELECT COUNT(*)  
FROM table_name. From the count, figure out the number of pages. Then  
compute an offset based on the page the user wants to view (e.g.  
assuming 50 rows per page and wanting to view page 90, the offset  
would be (50 * 90) - 1 = 4499) and formulate the SQL query as  
something like SELECT * FROM table_name ORDER BY {sort_order} OFFSET  
{computed_offset} LIMIT 50 (note that all this assuming a 50-row page  
size, and the user also has the ability to change the page size via  
the web interface)

 From a SQL data manipulation language perspective, it's pretty  
straight forward. From a SQL internal execution path, I really have  
no idea how it's implemented and don't know if it does any linear  
scanning to return the results. The fact is that our application  
allows you to navigate through the 7+ million row table in under 2  
seconds per page no matter which page you wish to view or sort order.  
 From a user perspective, 2 seconds for a browser-based screen  
refresh is more than acceptable. Will Elephant allow to "refresh" as  
quickly if in the current model it needs to do a linear scan? We  
haven't gotten there yet, but maybe someone can comment on that.

Thanks

On Oct 3, 2007, at 9:13 PM, Ian S Eslick wrote:

> When you say indexes are not sequential, do you mean UIDs are not  
> sequentially allocated?  I think there is a BDB sequence issue that  
> I've never worried about that jumps to the nearest 100 when you  
> reconnect.  However, if you create anything other than a user  
> object, you will also have gaps in the UID sequence so that's a  
> fundamental issue.  Don't assume anything about UIDs other than the  
> fact that they are unique.
>
> You could create and index your own field which is a sequential ID  
> for creation ordering, but it sounds like you probably want to  
> return a sublist based on some sort order like alphabetical by name  
> or by date.  In this case, at least doing the last page is easy,  
> map from end and count the # of users you want before you  
> terminate, but to find an element that is N elements away from the  
> first or last element in less than O(n) time isn't possible with  
> the underlying B-Trees we're using.
>
> The first question is whether you database is guaranteed to be so  
> big that you can't just do this linear time.  When you start to  
> face performance issues, then you can look at building that  
> additional data structure.
>
> Otherwise, you will have to implement a data structure that  
> maintains this information on top of the Elephant infrastructure.
>
> The first idea that occurs to me is to drop the idea of using an  
> indexed class or standalone btrees and just build a red-black tree  
> using object slots (you can inherit from a base class that  
> implements the RB tree functionality).  This simultaneously solves  
> the count problem and the access element # N problem.  The O(log  
> (base 2) N) lookup time will have a higher fixed cost per level  
> traversal, but if you start getting really large dbs (1000's to  
> 10k's?) then it will certainly beat a linear map-index approach.  i.e.
>
> http://en.wikipedia.org/wiki/Red-black_tree
>
> There is a lisp example of this data structure here:
>
> http://www.aviduratas.de/lisp/progs/rb-trees.lisp
>
> Now there is a problem that you'll need one of these for each  
> sorted order which for a list sorted many different ways is a  
> problem.  Anyone know how SQL query systems implement this?
>
> Just remember that premature optimization is one of the four  
> horseman of the apocalypse for the effective programmer.
>
> Ian
> ----- Original Message -----
> From: Mariano Montone
> To: Elephant bugs and development
> Sent: Wednesday, October 03, 2007 6:57 PM
> Subject: [elephant-devel] Collection paging
>
> Hello, it's me again :S.
>
> I would like to know how I can access persistent collection pages  
> efficiently.
>
> What I'm trying to do is making work a web list component with  
> elephant. The list component is supposed to support well known  
> navigation commands, like look at the collection in pages, support  
> for first, last, next, previous buttons, and display of collection  
> size.
>
> The collection size problem was treated here:  http://common- 
> lisp.net/pipermail/elephant-devel/2007-October/001162.html.
>
> But now I have a problem with building the pages.
>
> My first try was:
>   (let*
>       ((start (* (current-page self) (page-size self)))
>        (end (+ start (page-size self)))
>        )
>         (<:ul
>          (elephant:map-btree #'(lambda (key elem) (declare (ignore  
> key))
>                        (let ((elem-text (make-elem-text self elem)))
>                          (<:li
>                           (if (slot-value self 'selectable)
>                           (<ucw:a :action (answer elem)  (<:as-html  
> elem-text))
>                           (<:a (<:as-html elem-text))))))
>                  (model self) :start start :end end)
>          )
>
> with start and end previously fixed based in the current page  
> number and size.
>
> But I realized indexes were not sequential when I created new  
> objects, as this shows:
>
> ASKIT> (with-btree-cursor (cursor (find-class-index 'user))
>   (iter
>     (for (values exists? k v) = (cursor-next cursor))
>     (while exists?)
>     (format *standard-output* "~A -> ~A ~%" k v)))
> 2 -> #<USER name: dssdf {B043379}>
> 3 -> #<USER name: ttttt {B045C69}>
> 5 -> #<USER name: ff {B048179}>
> 6 -> #<USER name: other {B04A451}>
> 7 -> #<USER name: guest {AD61271}>
> 100 -> #<USER name: qqq {B053001}>
> 101 -> #<USER name:  {B055721}>
> 102 -> #<USER name:  {B057E01}>
> 103 -> #<USER name:  {B05A529}>
> 104 -> #<USER name:  {B05CCF1}>
> 105 -> #<USER name:  {B05F579}>
> 106 -> #<USER name:  {B063E91}>
> 107 -> #<USER name: qqq {B066851}>
> 200 -> #<USER name:  {B069519}>
> 201 -> #<USER name:  {B06C009}>
> 300 -> #<USER name:  {B06EBA1}>
> 301 -> #<USER name: aaa {B0717D1}>
> NIL
>
> I don't think this is a bug, it must have to do with how Elephant  
> manages btrees; but then how am I supposed to access through pages?
> I would like to have to access all the objects from the beggining  
> just to discard them instantly (imagine a large collection and the  
> user wanting to see the last page).
>
> Thank you again :)
>
> Mariano
>
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list