[elephant-devel] Collection paging

Ian Eslick eslick at csail.mit.edu
Thu Oct 4 16:20:11 UTC 2007


When you get that 2 second result, have you declared an index on the  
fields you are sorting on or is this a full sort + count + offset query?

At least from what I can tell of the PostgreSQL implementation of  
SQL, the optimizer doesn't keep accurate counts for the underlying  
table so always has to do a table scan to compute counts.  So what  
seems to be happening is a very efficient full index scan and then  
perhaps some way of combining the sort with an accumulating count  
with a terminate at offset to bypass doing the full merge (although I  
doubt it's that efficient:).

One thing we can do in elephant is to avoid deserializing the value  
part of key-value pairs and just operate on the keys (for index scans  
or linear count scans).  This will help keep the lisp GC memory from  
thrashing too much.

Ian

On Oct 3, 2007, at 11:07 PM, lists at infoway.net wrote:

> I have to say that Mariano is hitting some of the issues we will be  
> facing soon as our quest to learn Lisp and Elephant continues and  
> we continue working on migrating some of our SQL-based applications  
> over. This particular need of his is also a real need we have since  
> it's something we offer to our application users. For example, in  
> the application we are working on migrating, we have a table with  
> over 7 million rows. This obviously has many thousands of 50-row  
> pages to navigate through. Our user interface offers the "usual"  
> search, sort by any column(s), and page navigation (First, Last,  
> Next, Previous, or manually input a page number).
>
> The way we handle this in our code is something like SELECT COUNT 
> (*) FROM table_name. From the count, figure out the number of  
> pages. Then compute an offset based on the page the user wants to  
> view (e.g. assuming 50 rows per page and wanting to view page 90,  
> the offset would be (50 * 90) - 1 = 4499) and formulate the SQL  
> query as something like SELECT * FROM table_name ORDER BY  
> {sort_order} OFFSET {computed_offset} LIMIT 50 (note that all this  
> assuming a 50-row page size, and the user also has the ability to  
> change the page size via the web interface)
>
> From a SQL data manipulation language perspective, it's pretty  
> straight forward. From a SQL internal execution path, I really have  
> no idea how it's implemented and don't know if it does any linear  
> scanning to return the results. The fact is that our application  
> allows you to navigate through the 7+ million row table in under 2  
> seconds per page no matter which page you wish to view or sort  
> order. From a user perspective, 2 seconds for a browser-based  
> screen refresh is more than acceptable. Will Elephant allow to  
> "refresh" as quickly if in the current model it needs to do a  
> linear scan? We haven't gotten there yet, but maybe someone can  
> comment on that.
>
> Thanks
>
> On Oct 3, 2007, at 9:13 PM, Ian S Eslick wrote:
>
>> When you say indexes are not sequential, do you mean UIDs are not  
>> sequentially allocated?  I think there is a BDB sequence issue  
>> that I've never worried about that jumps to the nearest 100 when  
>> you reconnect.  However, if you create anything other than a user  
>> object, you will also have gaps in the UID sequence so that's a  
>> fundamental issue.  Don't assume anything about UIDs other than  
>> the fact that they are unique.
>>
>> You could create and index your own field which is a sequential ID  
>> for creation ordering, but it sounds like you probably want to  
>> return a sublist based on some sort order like alphabetical by  
>> name or by date.  In this case, at least doing the last page is  
>> easy, map from end and count the # of users you want before you  
>> terminate, but to find an element that is N elements away from the  
>> first or last element in less than O(n) time isn't possible with  
>> the underlying B-Trees we're using.
>>
>> The first question is whether you database is guaranteed to be so  
>> big that you can't just do this linear time.  When you start to  
>> face performance issues, then you can look at building that  
>> additional data structure.
>>
>> Otherwise, you will have to implement a data structure that  
>> maintains this information on top of the Elephant infrastructure.
>>
>> The first idea that occurs to me is to drop the idea of using an  
>> indexed class or standalone btrees and just build a red-black tree  
>> using object slots (you can inherit from a base class that  
>> implements the RB tree functionality).  This simultaneously solves  
>> the count problem and the access element # N problem.  The O(log  
>> (base 2) N) lookup time will have a higher fixed cost per level  
>> traversal, but if you start getting really large dbs (1000's to  
>> 10k's?) then it will certainly beat a linear map-index approach.   
>> i.e.
>>
>> http://en.wikipedia.org/wiki/Red-black_tree
>>
>> There is a lisp example of this data structure here:
>>
>> http://www.aviduratas.de/lisp/progs/rb-trees.lisp
>>
>> Now there is a problem that you'll need one of these for each  
>> sorted order which for a list sorted many different ways is a  
>> problem.  Anyone know how SQL query systems implement this?
>>
>> Just remember that premature optimization is one of the four  
>> horseman of the apocalypse for the effective programmer.
>>
>> Ian
>> ----- Original Message -----
>> From: Mariano Montone
>> To: Elephant bugs and development
>> Sent: Wednesday, October 03, 2007 6:57 PM
>> Subject: [elephant-devel] Collection paging
>>
>> Hello, it's me again :S.
>>
>> I would like to know how I can access persistent collection pages  
>> efficiently.
>>
>> What I'm trying to do is making work a web list component with  
>> elephant. The list component is supposed to support well known  
>> navigation commands, like look at the collection in pages, support  
>> for first, last, next, previous buttons, and display of collection  
>> size.
>>
>> The collection size problem was treated here:  http://common- 
>> lisp.net/pipermail/elephant-devel/2007-October/001162.html.
>>
>> But now I have a problem with building the pages.
>>
>> My first try was:
>>   (let*
>>       ((start (* (current-page self) (page-size self)))
>>        (end (+ start (page-size self)))
>>        )
>>         (<:ul
>>          (elephant:map-btree #'(lambda (key elem) (declare (ignore  
>> key))
>>                        (let ((elem-text (make-elem-text self elem)))
>>                          (<:li
>>                           (if (slot-value self 'selectable)
>>                           (<ucw:a :action (answer elem)  (<:as- 
>> html elem-text))
>>                           (<:a (<:as-html elem-text))))))
>>                  (model self) :start start :end end)
>>          )
>>
>> with start and end previously fixed based in the current page  
>> number and size.
>>
>> But I realized indexes were not sequential when I created new  
>> objects, as this shows:
>>
>> ASKIT> (with-btree-cursor (cursor (find-class-index 'user))
>>   (iter
>>     (for (values exists? k v) = (cursor-next cursor))
>>     (while exists?)
>>     (format *standard-output* "~A -> ~A ~%" k v)))
>> 2 -> #<USER name: dssdf {B043379}>
>> 3 -> #<USER name: ttttt {B045C69}>
>> 5 -> #<USER name: ff {B048179}>
>> 6 -> #<USER name: other {B04A451}>
>> 7 -> #<USER name: guest {AD61271}>
>> 100 -> #<USER name: qqq {B053001}>
>> 101 -> #<USER name:  {B055721}>
>> 102 -> #<USER name:  {B057E01}>
>> 103 -> #<USER name:  {B05A529}>
>> 104 -> #<USER name:  {B05CCF1}>
>> 105 -> #<USER name:  {B05F579}>
>> 106 -> #<USER name:  {B063E91}>
>> 107 -> #<USER name: qqq {B066851}>
>> 200 -> #<USER name:  {B069519}>
>> 201 -> #<USER name:  {B06C009}>
>> 300 -> #<USER name:  {B06EBA1}>
>> 301 -> #<USER name: aaa {B0717D1}>
>> NIL
>>
>> I don't think this is a bug, it must have to do with how Elephant  
>> manages btrees; but then how am I supposed to access through pages?
>> I would like to have to access all the objects from the beggining  
>> just to discard them instantly (imagine a large collection and the  
>> user wanting to see the last page).
>>
>> Thank you again :)
>>
>> Mariano
>>
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list