From pete-cl-ppcre at kazmier.com Mon Jul 18 00:02:05 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Sun, 17 Jul 2005 20:02:05 -0400 Subject: [cl-ppcre-devel] Byte vectors instead of strings Message-ID: <20050718000205.GA17272@kazmier.com> Hi Edi, How hard would it be to modify cl-ppcre to work on byte vectors instead of strings? I'm trying to obtain faster performance when parsing large log files. Most of the time spent processing the logs is wasted on the creation of strings. I want to use read-sequence with unsigned-byte as the external format to avoid that processing. Of course, this means I need a regexp library that can handle byte vectors. As a newbie, is it even worth hacking cl-ppcre to use byte vectors or is the difficulty level too high? I am also considering learning FFI and just making an interface to a standard C regexp library which will work with bytes. However, if I can use cl-ppcre, I'd prefer as its written in CL. Thanks, Pete From edi at agharta.de Mon Jul 18 00:09:07 2005 From: edi at agharta.de (Edi Weitz) Date: Mon, 18 Jul 2005 02:09:07 +0200 Subject: [cl-ppcre-devel] Byte vectors instead of strings In-Reply-To: <20050718000205.GA17272@kazmier.com> (pete-cl-ppcre@kazmier.com's message of "Sun, 17 Jul 2005 20:02:05 -0400") References: <20050718000205.GA17272@kazmier.com> Message-ID: On Sun, 17 Jul 2005 20:02:05 -0400, pete-cl-ppcre at kazmier.com wrote: > How hard would it be to modify cl-ppcre to work on byte vectors > instead of strings? I'm trying to obtain faster performance when > parsing large log files. Most of the time spent processing the logs > is wasted on the creation of strings. I want to use read-sequence > with unsigned-byte as the external format to avoid that processing. > Of course, this means I need a regexp library that can handle byte > vectors. > > As a newbie, is it even worth hacking cl-ppcre to use byte vectors > or is the difficulty level too high? I am also considering learning > FFI and just making an interface to a standard C regexp library > which will work with bytes. However, if I can use cl-ppcre, I'd > prefer as its written in CL. Hi Pete! If I'm not mistaken this has already been done. I seem to remember someone patched CL-PPCRE to work on arbitrary sequences and this was done for the CLIMACS project. If you can't find it in the CLIMACS sources which should be online somewhere you could ask Robert Strandh - he should know about it. Google will find his homepage. Maybe there's also an initial conversation about this topic in the archives of this mailing list. Sorry that I can't be more helpful at the moment but I'm in a hurry. Cheers, Edi. PS: And in case you have to do it yourself: It shouldn't be /too/ hard but maybe a bit tedious. From edi at agharta.de Mon Jul 18 00:12:04 2005 From: edi at agharta.de (Edi Weitz) Date: Mon, 18 Jul 2005 02:12:04 +0200 Subject: [cl-ppcre-devel] Byte vectors instead of strings In-Reply-To: (Edi Weitz's message of "Mon, 18 Jul 2005 02:09:07 +0200") References: <20050718000205.GA17272@kazmier.com> Message-ID: On Mon, 18 Jul 2005 02:09:07 +0200, Edi Weitz wrote: > If I'm not mistaken this has already been done. I seem to remember > someone patched CL-PPCRE to work on arbitrary sequences and this was > done for the CLIMACS project. Googling for "CLIMACS CL-PPCRE" revealed this one: From pete-cl-ppcre at kazmier.com Mon Jul 18 00:20:11 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Sun, 17 Jul 2005 20:20:11 -0400 Subject: [cl-ppcre-devel] Byte vectors instead of strings In-Reply-To: References: <20050718000205.GA17272@kazmier.com> Message-ID: <20050718002011.GA17403@kazmier.com> On Mon, Jul 18, 2005 at 02:09:07AM +0200, Edi Weitz wrote: > If I'm not mistaken this has already been done. I seem to remember > someone patched CL-PPCRE to work on arbitrary sequences and this was > done for the CLIMACS project. If you can't find it in the CLIMACS > sources which should be online somewhere you could ask Robert Strandh > - he should know about it. Google will find his homepage. Maybe > there's also an initial conversation about this topic in the archives > of this mailing list. > > Sorry that I can't be more helpful at the moment but I'm in a hurry. Great! I should have read the archives before posting (sorry). I'll investigate further. Thanks for the suggestions! Pete From pete-cl-ppcre at kazmier.com Mon Jul 18 01:41:35 2005 From: pete-cl-ppcre at kazmier.com (pete-cl-ppcre at kazmier.com) Date: Sun, 17 Jul 2005 21:41:35 -0400 Subject: [cl-ppcre-devel] Byte vectors instead of strings In-Reply-To: References: <20050718000205.GA17272@kazmier.com> Message-ID: <20050718014135.GA17618@kazmier.com> On Sun, Jul 17, 2005 at 06:36:48PM -0600, Jim Prewett wrote: > > i'm just cureous, what sort of log processing are you doing? I'm responsible for all of the network management systems for a VoIP telecom company. Part of our architecture is the real-time monitoring of various logs such as the syslog messages generated by about 1000+ Cisco devices as well as various application log files. Currently, I use my own Python software called LogWrap[1] for this purpose. Another part of our architecture is the post processing of log files for trend analysis, intrusion detection analysis, etc ... This analysis is done with a whole bunch of Python scripts. Over the past year, I've been learning CL in my free time and have been trying to slowly introduce CL at work in both of the above areas. My first attempt was to write some of the post processing tools in CL because I thought that CL coupled with cl-ppcre would be much faster than my existing Python tools. This was not the case because the open-source CL implementations were slow due to the IO processing. Now I am now trying to use byte vectors with cl-ppcre to see if this will significantly speed up the processing. > I've been working on a "generic log analysis" application for a couple of > years now that (very) strongly suggests CL-PPCRE called LoGS. I'm reading about it now and it sounds very interesting and familiar as my Python LogWrap does some of the same, rules, actions, suppression, generic handlers, etc. Its time for me to go to bed now, but I will read more about this tomorrow as it may help me with the real-time part of my architecture. I was going to write LogWrap in Lisp, but it sounds like you've saved me the trouble. > Is there a collaboration here? Perhaps, but it seems at first glance that LoGS has everything I need. After I go through the documentation more thoroughly, I'll be able to determine if there are any missing pieces of functionality I might want to add and contribute if wanted. Thanks, Pete [1] http://www.kazmier.com/computer/logwrap From download at hpc.unm.edu Mon Jul 18 04:37:29 2005 From: download at hpc.unm.edu (Jim Prewett) Date: Sun, 17 Jul 2005 22:37:29 -0600 (MDT) Subject: [cl-ppcre-devel] Byte vectors instead of strings In-Reply-To: <20050718014135.GA17618@kazmier.com> References: <20050718000205.GA17272@kazmier.com> <20050718014135.GA17618@kazmier.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I'm responsible for all of the network management systems for a VoIP > telecom company. Part of our architecture is the real-time monitoring > of various logs such as the syslog messages generated by about 1000+ > Cisco devices as well as various application log files. Currently, I > use my own Python software called LogWrap[1] for this purpose. Another > part of our architecture is the post processing of log files for trend > analysis, intrusion detection analysis, etc ... This analysis is done > with a whole bunch of Python scripts. sounds familular ;) I'm a sys-admin for several HPC (cluster) systems... > Over the past year, I've been learning CL in my free time and have been > trying to slowly introduce CL at work in both of the above areas. My > first attempt was to write some of the post processing tools in CL > because I thought that CL coupled with cl-ppcre would be much faster > than my existing Python tools. This was not the case because the > open-source CL implementations were slow due to the IO processing. Now > I am now trying to use byte vectors with cl-ppcre to see if this will > significantly speed up the processing. hmmm... I thought the IO was a little on the slow side too :) However, I found that most of my competition (SEC, Logsurfer, SWATCH, etc) have some pretty silly (and inefficient) notions built in like "flat" rulesets... Lisp is, IMO, part of the reason I was able to take advantage of better ideas to get more speed (not that a tree shaped structure is profound :) > I'm reading about it now and it sounds very interesting and familiar as > my Python LogWrap does some of the same, rules, actions, suppression, > generic handlers, etc. Its time for me to go to bed now, but I will > read more about this tomorrow as it may help me with the real-time part > of my architecture. I was going to write LogWrap in Lisp, but it sounds > like you've saved me the trouble. Well, I've done at least some of the work; I'm always interested in collaborators too :) LoGS is my project to teach myself some Common Lisp. We chose lisp because we felt s-expressions would be a good way to express rules that write rules ... that write rules; that was my original turn-off from Logsurfer (yes, you can do it, but after about 4 levels deep, the escaping is too much of a nightmare! :) s-expressions are free with Lisp :) > Perhaps, but it seems at first glance that LoGS has everything I need. Wow! Really? ;) > After I go through the documentation more thoroughly, I'll be able to Oh, my apoligies for the current state of the documentation :) I think its /mostly/ accurate although a little lacking :) I'm hoping to do some serious documentation fix-ups for 0.1.0 (which should be coming out shortly after 0.0.4, which I'm hoping to release very very soon! Maybe late August for 0.1.0?) Anyway, please feel free to shoot me any questions! there's also a mailing list, but the membership is quite small. > determine if there are any missing pieces of functionality I might want > to add and contribute if wanted. oh yeah, definantly! I'm very happy to accept patches (as long as I can see them being remotely useful). If you dig LoGS, maybe you could share some of your rulesets with me? I think CL & Log analysis are a (surprisingly?) good match! I'm also very interested in exploring what is possible with log analysis; I'm not very happy with any of the available tools, including LoGS, LoGS is just the best thing going (IMO). Anyway, I'd love to add you to my (mental) short list of users... I think I've got 7-ish right now (including myself :) > [1] http://www.kazmier.com/computer/logwrap I'll check that out! thanks! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Made with pgp4pine 1.76 iD8DBQFC2zIOv/zdxjGBbZMRAsiBAKCX80mksWIYijx3zgykCXrlN2O76gCffQDc CPwFWSpI3lc43MOfb9d0ZrA= =pQEa -----END PGP SIGNATURE----- From edi at agharta.de Tue Jul 19 23:24:19 2005 From: edi at agharta.de (Edi Weitz) Date: Wed, 20 Jul 2005 01:24:19 +0200 Subject: [cl-ppcre-devel] New release 1.2.10 Message-ID: Changelog: Version 1.2.10 2005-07-20 Fixed bug in CHAR-SEARCHER-AUX (caught by Peter Schuller) Don't redefine what's already there (for LispWorks) Download: Cheers, Edi.