perl.ai http://www.nntp.perl.org/group/perl.ai/ ... Copyright 1998-2008 perl.org Wed, 08 Oct 2008 01:03:11 +0000 ask@perl.org Categorizer::Learner::SVM, scores of categories? by Jiuan-Ru Jennifer Lai Hi,<br/><br/>I used the Categorizer::SVM library for large data classification (great<br/>tool); however, I&#39;m having trouble analyzing the result from the SVM<br/>learner.<br/><br/>- Categories have scores of either 0 or 1, with 1 being that this document<br/>belongs to this category, and 0 otherwise. Are there any scores representing<br/>probabilities or confidence level of belong to certain category other than<br/>these 0, 1 values?<br/>- Suppose this document could belong to 3 possible categories: cat1, cat2,<br/>and cat3. The best_category method simply picks the first category as the<br/>classification decision. If you call, $hypothesis-&gt;categories, the<br/>categories outputed don&#39;t seem to be in the order of probabilities or<br/>confidence level. They seem to be in the fixed order....and whatever listed<br/>first is favored.<br/><br/>I hope someone can clear my confusion on the scores of categories in the<br/>SVM module.<br/><br/>Thank you very much in advance,<br/>Jennifer<br/><br/> http://www.nntp.perl.org/group/perl.ai/2008/03/msg576.html Tue, 18 Mar 2008 11:53:43 +0000 yawn cornerstone by Hester Newton Big News For SZSN! Shares Rocket! UP 37.5%<br/><br/>Shandong Zhouyuan Seed and Nursery Co., Ltd (SZSN)<br/>$0.33 UP 37.5%<br/><br/>SZSN new releases show huge expansion and Multi-Million dollar projects.<br/>Share prices rocket! Friday&#39;s trading was strong. Get On SZSN first<br/>thing Monday!<br/><br/>Yes, it is that good. If they have work and family commitments, then<br/>perhaps it is easier to set aside a weekend to see a large number of<br/>bands in one go than go to regular gigs. He was detained by police in<br/>Middlesex on Saturday in connection with failing to attend a court<br/>hearing over alleged drugs offences in Glasgow.<br/><br/>Closest to it in spirit is the Outsider Festival, held at Rothiemuchus,<br/>near Aviemore, with an emphasis on outdoor activities as well as music,<br/>and directly targeted at the older fan. With the likes of the Beastie<br/>Boys, Primal Scream and Bjork headlining, music is still to the fore,<br/>but more established groups dominate the line-up.<br/>Half Day FishingMaybe you guys can take it up on the northern Atlantic<br/>and Pacific coasts, but in the Southeast, a whole day of fishing in this<br/>unbearable summer heat can really wear a fellow out. She&#39;s very active<br/>in an Women help association, fighting for women rights, against<br/>excision.<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/07/msg575.html Sun, 15 Jul 2007 10:04:53 +0000 Add documents to a learner? by Ignacio J. Ortega Lopera It&#39;s possible to add training to a learner? how?<br/><br/>What i try is to reopen a state file, and add new documents to the training<br/>set without reading the entire corpus again..<br/><br/>It&#39;s seems that Algorithm::NativeBayes has a &quot;purge&quot; parameter that seems to<br/>help doing that, it permit add new instances, after a train..<br/><br/>Saludios, Ignacio J. Ortega<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/06/msg574.html Thu, 07 Jun 2007 00:46:14 +0000 Re: Problems trying to predict by Ignacio J. Ortega Lopera Hola Ken:<br/><br/>Many thanks, your advice, did the trick.. nad yes it was when reloading<br/>state from file..<br/><br/>2007/6/7, Ken Williams &lt;ken@mathforum.org&gt;:<br/>&gt;<br/>&gt; Hi Ignacio,<br/>&gt;<br/>&gt; Is this when loading a pre-trained categorizer from a saved file?<br/>&gt; This is a known problem, but I haven&#39;t settled on a good solution.<br/>&gt;<br/>&gt; A simple workaround is to just put:<br/>&gt;<br/>&gt; use Algorithm::NaiveBayes::Model::Frequency;<br/>&gt;<br/>&gt; in the script that&#39;s currently failing.<br/>&gt;<br/>&gt; -Ken<br/>&gt;<br/>&gt;<br/>&gt; On May 30, 2007, at 1:47 PM, Ignacio J. Ortega Lopera wrote:<br/>&gt;<br/>&gt; &gt; i&#39;m getting this:<br/>&gt; &gt;<br/>&gt; &gt; Can&#39;t locate object method &quot;predict&quot; via package<br/>&gt; &gt; &quot;Algorithm::NaiveBayes::Model::<br/>&gt; &gt; Frequency&quot; at<br/>&gt; &gt; /usr/lib/perl5/site_perl/5.8.0/AI/Categorizer/Learner/NaiveBayes.p<br/>&gt; &gt; m line 28.<br/>&gt; &gt;<br/>&gt; &gt; when trying to get hypoteses.. for a new doc....<br/>&gt; &gt;<br/>&gt; &gt; anyone know if this is a silly one?<br/>&gt;<br/>&gt;<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/06/msg573.html Thu, 07 Jun 2007 00:41:42 +0000 Re: Problems trying to predict by Ken Williams Hi Ignacio,<br/><br/>Is this when loading a pre-trained categorizer from a saved file? <br/>This is a known problem, but I haven&#39;t settled on a good solution.<br/><br/>A simple workaround is to just put:<br/><br/> use Algorithm::NaiveBayes::Model::Frequency;<br/><br/>in the script that&#39;s currently failing.<br/><br/> -Ken<br/><br/><br/>On May 30, 2007, at 1:47 PM, Ignacio J. Ortega Lopera wrote:<br/><br/>&gt; i&#39;m getting this:<br/>&gt;<br/>&gt; Can&#39;t locate object method &quot;predict&quot; via package<br/>&gt; &quot;Algorithm::NaiveBayes::Model::<br/>&gt; Frequency&quot; at<br/>&gt; /usr/lib/perl5/site_perl/5.8.0/AI/Categorizer/Learner/NaiveBayes.p<br/>&gt; m line 28.<br/>&gt;<br/>&gt; when trying to get hypoteses.. for a new doc....<br/>&gt;<br/>&gt; anyone know if this is a silly one?<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/06/msg572.html Wed, 06 Jun 2007 23:41:20 +0000 AI::Categorizer and Umlauts? by Robert Barta Hi,<br/><br/>I seem to have problems with umlauts, such as in words<br/><br/> Pr&auml;sentation<br/><br/>When a document is added with<br/><br/> return new AI::Categorizer::Document(name =&gt; $filename,<br/> content =&gt; $content);<br/><br/>to the collection, after loading and finish, the feature vector<br/>contains only fragments of these words, such as<br/><br/> pr =&gt; 1<br/> sentation =&gt; 1<br/><br/>Setting the locale on the shell or in Perl does not have any effect<br/><br/> use locale;<br/><br/>not even with turning on de_AT explicitly.<br/><br/>--<br/><br/>Aaaaaah, lib/AI/Categorizer/Document.pm is NOT using locale and use locale<br/>is very, uhm, local %-)<br/><br/>Patching the file does not seem to break the test cases.<br/><br/>\rho<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/06/msg571.html Mon, 04 Jun 2007 19:25:40 +0000 AI::Categorizer suggestion for repackaging by Robert Barta Hi,<br/><br/>This is probably more relevant to the maintainer of AI::Categorizer:<br/><br/>It would be a bit simpler to debianize the package if the dependency<br/>to the Weka system would be factored out to a separate Perl package.<br/><br/>Otherwise I have not found a problem in making it a Debian package.<br/><br/>\rho<br/> http://www.nntp.perl.org/group/perl.ai/2007/06/msg570.html Mon, 04 Jun 2007 19:25:25 +0000 Problems trying to predict by Ignacio J. Ortega Lopera i&#39;m getting this:<br/><br/>Can&#39;t locate object method &quot;predict&quot; via package<br/>&quot;Algorithm::NaiveBayes::Model::<br/>Frequency&quot; at<br/>/usr/lib/perl5/site_perl/5.8.0/AI/Categorizer/Learner/NaiveBayes.p<br/>m line 28.<br/><br/>when trying to get hypoteses.. for a new doc....<br/><br/>anyone know if this is a silly one?<br/><br/>Thanks in advance<br/><br/>Saludos, Ignacio J. Ortega<br/>----------------------------------------------------------------<br/>Technical manager<br/>http://www.derecho.com/<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/05/msg569.html Wed, 30 May 2007 11:47:45 +0000 package AI::Categorizer::Collection::DBI; by Ignacio J. Ortega Lopera http://www.nntp.perl.org/group/perl.ai/2007/05/msg568.html Wed, 30 May 2007 09:49:39 +0000 how to use the function of "feature selection" under AI::Categorizer by jhoon <br/>Hello,<br/><br/>I&iexcl;&macr;d like to select more important features using AI::Categorizer, and so<br/>modified demo.pl as follows<br/>=== FROM === <br/>my $k = AI::Categorizer::KnowledgeSet-&gt;new( verbose =&gt; 1 );<br/>=== TO ===<br/>my $k = AI::Categorizer::KnowledgeSet-&gt;new( verbose =&gt; 1,<br/> feature_selector =&gt; new AI::Categorizer::FeatureSelector::DocFrequency(<br/>&iexcl;&iexcl; verbose =&gt; 1,<br/>&iexcl;&iexcl; features_kept =&gt; 1000<br/>&iexcl;&iexcl; )<br/>);<br/>=== END ===<br/>I observed the performance according to change the value of features_kept,<br/>but the performance is always same. I&iexcl;&macr;d appreciate it if you tell me how<br/>to do the feature selection using AI::Categorizer? <br/><br/>Thank you very much in advance.<br/><br/>Jae-Hoon. <br/><br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/05/msg567.html Fri, 25 May 2007 04:09:13 +0000 Re: how to do feature selection by Alan Gibson im not sure if this pertains to your problem exactly, but you probably<br/>want to specify the weighting method like<br/><br/>my $k = AI::Categorizer::KnowledgeSet-&gt;new( verbose =&gt; 1 ,<br/> features_kept = 5000,<br/> tfidf_weighting=&gt;&#39;nfc&#39;<br/>);<br/><br/>the default weighting is &#39;xxx&#39; which if i understand correctly doesnt<br/>actually do anything.<br/><br/>alan<br/> http://www.nntp.perl.org/group/perl.ai/2007/05/msg566.html Wed, 23 May 2007 19:24:41 +0000 how to do feature selection by Jianmin WU hi, buddies,<br/><br/>I am not sure if i am in the right place. :-)<br/><br/>I am a fresh man to the perl and perl AI module.<br/><br/>I am trying to do the NaiveBayes experiments with the help of code demo.pl in<br/>example of the module of AI::Categorizer.<br/>Now I am confused about how to do the feature selection.<br/><br/>The documents say that KnowledgeSet::load( ) will do feature selection and<br/>read the corpus at the same time. So, I change the construction of<br/>KnowledgeSet in<br/>demo.pl from<br/>my $k = AI::Categorizer::KnowledgeSet-&gt;new( verbose =&gt; 1 );<br/>$k-&gt;load( collection =&gt; $training )<br/>to<br/>my $k = AI::Categorizer::KnowledgeSet-&gt;new( verbose =&gt; 1 , features_kept =<br/>5000 );<br/>$k-&gt;load( collection =&gt; $training )<br/><br/>Then I re-run the code with expection to keep the top 5000 features with<br/>high Document Frequency.<br/>But it seems that there is no difference as before. do i misunderstand any<br/>point ?<br/><br/>And also, is there any smoothing method implemented in<br/>AI::Categorizer::Learner::NaiveBayes ?<br/><br/>Thanks for your attention<br/><br/>Jianmin<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/05/msg565.html Sat, 19 May 2007 05:43:12 +0000 Re: [ANNOUNCE] AI-Categorizer 0.08 -> CPAN by mentifex &gt; Hi,<br/>&gt;<br/>&gt; After almost 4 years since the previous release,<br/>&gt; I&#39;ve uploaded a new AI::Categorizer to CPAN. <br/>&gt; It&#39;s a minor set of changes with only a <br/>&gt; couple bug fixes and additions:<br/>&gt; [...]<br/>&gt; -Ken<br/>Glad to hear the news of progress.<br/><br/>Arthur<br/>--<br/>http://mind.sourceforge.net/perl.html <br/>http://mind.sourceforge.net/Mind.html <br/> http://www.nntp.perl.org/group/perl.ai/2007/03/msg564.html Wed, 21 Mar 2007 04:34:39 +0000 [ANNOUNCE] AI-Categorizer 0.08 -> CPAN by Ken Williams Hi,<br/><br/>After almost 4 years since the previous release, I&#39;ve uploaded a new <br/>AI::Categorizer to CPAN. It&#39;s a minor set of changes with only a <br/>couple bug fixes and additions:<br/><br/><br/>0.08 - Tue Mar 20 19:39:41 2007<br/><br/> - Added a ChiSquared feature selection class. [Francois Paradis]<br/><br/> - Changed the web locations of the reuters-21578 corpus that<br/> eg/demo.pl uses, since the location it referenced previously has<br/> gone away.<br/><br/> - The building &amp; installing process now uses Module::Build rather<br/> than ExtUtils::MakeMaker.<br/><br/> - When the features_kept mechanism was used to explicitly state the<br/> features to use, and the scan_first parameter was left as its<br/> default value, the features_kept mechanism would silently fail to<br/> do anything. This has now been fixed. [Spotted by Arnaud Gaudinat]<br/><br/> - Recent versions of Weka have changed the name of the SVM class, so<br/> I&#39;ve updated it in our test (t/03-weka.t) of the Weka wrapper<br/> too. [Sebastien Aperghis-Tramoni]<br/><br/><br/> -Ken<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/03/msg563.html Wed, 21 Mar 2007 01:07:05 +0000 Probabilities with SVM by Alan Gibson im trying to use expectation-maximization to bootstrap an svm<br/>classifier. for this to work, the classifier needs to return better<br/>than random probabilities for its classification decisions. so not<br/>being one to repeat work, i thought i would see if anyone is setting<br/>on an implementation of AI::Categorizer::Learner::SVM that returns the<br/>probabilities produced by libsvm.<br/><br/>any code or criticisms would be greatly appreciated.<br/><br/>thanks,<br/>alan gibson<br/> http://www.nntp.perl.org/group/perl.ai/2007/03/msg562.html Sun, 11 Mar 2007 17:51:06 +0000 Re: text categorization with SVM and NaiveBayes by Ken Williams <br/>On Jan 8, 2007, at 10:51 AM, Tom Fawcett wrote:<br/><br/>&gt; Just to add a note here: Ken is correct -- both NB and SVMs are <br/>&gt; known to be rather poor at providing accurate probabilities. Their <br/>&gt; scores tend to be too extreme. Producing good probabilities from <br/>&gt; these scores is called calibrating the classifier, and it&#39;s more <br/>&gt; complex than just taking a root of the score. There are several <br/>&gt; methods for calibrating scores. The good news is that there&#39;s an <br/>&gt; effective one called isotonic regression (or Pool Adjacent <br/>&gt; Violators) which is pretty easy and fast. The bad news is that <br/>&gt; there&#39;s no plug-in (ie, CPAN-ready) perl implementation of it (I&#39;ve <br/>&gt; got a simple implementation which I should convert and contribute <br/>&gt; someday).<br/>&gt;<br/>&gt; If you want to read about classifier calibration, google one of <br/>&gt; these titles:<br/>&gt;<br/>&gt; &quot;Transforming classifier scores into accurate multiclass <br/>&gt; probability estimates&quot;<br/>&gt; by Bianca Zadrozny and Charles Elkan<br/>&gt;<br/>&gt; &quot;Predicting Good Probabilities With Supervised Learning&quot;<br/>&gt; by A. Niculescu-Mizil and R. Caruana<br/><br/><br/>Cool, thanks for the references. It might be nice to add somesuch <br/>scheme to Algorithm::NaiveBayes (and friends), so that the user has a <br/>choice of several normalization schemes, including &quot;none&quot;. If I get <br/>a surplus of tuits I&#39;ll add it, or if you feel like contributing your <br/>stuff that would be great too.<br/><br/> -Ken<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg561.html Tue, 09 Jan 2007 04:52:25 +0000 Re: text categorization with SVM and NaiveBayes by Tom Fawcett On Jan 7, 2007, at 9:23 PM, Ken Williams wrote:<br/>&gt;&gt; I would happily ignore all this and use NB, but it has one major <br/>&gt;&gt; flaw.<br/>&gt;&gt; &quot;The winner takes it all&quot;, the first result returned is way too far<br/>&gt;&gt; (as in distance :)) from the others, which isn&#39;t exactly accurate if<br/>&gt;&gt; one cares of a balanced results pool. I don&#39;t know whether this is an<br/>&gt;&gt; implementation problem - I poked around the rescale() function in<br/>&gt;&gt; Util.pm with no real success - or a general algorithm problem. My <br/>&gt;&gt; goal<br/>&gt;&gt; is to have an implementation that can say: this text is 60% cat X, <br/>&gt;&gt; 20%<br/>&gt;&gt; cat Y, 18% cat Z and 2% other cats. Is this feasible ? If so, what<br/>&gt;&gt; approach would you recommend (which algorithm, which <br/>&gt;&gt; implementation or<br/>&gt;&gt; what path for implementing it ) ?<br/>&gt;<br/>&gt; Unfortunately, neither NB nor SVMs can really tell you that. SVMs <br/>&gt; are purely discriminative, so all they can tell you is &quot;I think <br/>&gt; this new example is more like class A than class B in my training <br/>&gt; data&quot;. There&#39;s no probability involved at all. That said, I <br/>&gt; believe there has been some research into how to translate SVM <br/>&gt; output scores into probabilities or confidence scores, but I&#39;m not <br/>&gt; really familiar with it.<br/>&gt;<br/>&gt; NB on the surface would seem to be a better option since it&#39;s <br/>&gt; directly based on probabilities, but again the algorithm was <br/>&gt; designed only to discriminate, so all those denominators that are <br/>&gt; thrown away (the &quot;P(words)&quot; terms in the A::NB documentation) mean <br/>&gt; that the notion of probabilities is lost. The rescale() function <br/>&gt; is basically just a hack to return scores that are a little more <br/>&gt; convenient to work with than the raw output of the algorithm. As <br/>&gt; you&#39;ve seen, it tends to be a little arrogant, greatly exaggerating <br/>&gt; the score for the first category and giving tiny scores to the <br/>&gt; rest. I&#39;m sure there are better algorithms that could be used <br/>&gt; there, but in many cases either one doesn&#39;t really care about the <br/>&gt; actual scores, or one (*ahem*) does something ad hoc like taking <br/>&gt; the square root of all the scores, or the fifth root, or whatever, <br/>&gt; just to get some numbers that look better to end users.<br/><br/>Just to add a note here: Ken is correct -- both NB and SVMs are known <br/>to be rather poor at providing accurate probabilities. Their scores <br/>tend to be too extreme. Producing good probabilities from these <br/>scores is called calibrating the classifier, and it&#39;s more complex <br/>than just taking a root of the score. There are several methods for <br/>calibrating scores. The good news is that there&#39;s an effective one <br/>called isotonic regression (or Pool Adjacent Violators) which is <br/>pretty easy and fast. The bad news is that there&#39;s no plug-in (ie, <br/>CPAN-ready) perl implementation of it (I&#39;ve got a simple <br/>implementation which I should convert and contribute someday).<br/><br/>If you want to read about classifier calibration, google one of these <br/>titles:<br/><br/>&quot;Transforming classifier scores into accurate multiclass probability <br/>estimates&quot;<br/>by Bianca Zadrozny and Charles Elkan<br/><br/>&quot;Predicting Good Probabilities With Supervised Learning&quot;<br/>by A. Niculescu-Mizil and R. Caruana<br/><br/>Regards,<br/>-Tom<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg560.html Mon, 08 Jan 2007 12:23:57 +0000 Re: text categorization with SVM and NaiveBayes by Ken Williams <br/>On Jan 5, 2007, at 7:10 AM, zgrim wrote:<br/><br/>&gt; So, back to my dilemmas. :) The results are puzzling, as many of the<br/>&gt; research papers on the subject I&#39;ve consulted say that SVM is<br/>&gt; supposedly the best algorithm for this task. The radial kernel should<br/>&gt; give the best results, for empirical-found values of gamma and C.<br/><br/>This may be an issue with your corpus - I quite often find that when <br/>I don&#39;t have enough training data for the SVM to pick up on the <br/>&quot;truth&quot; patterns, or (somewhat equivalently) when there&#39;s a lot of <br/>noise in the data, a linear kernel will outperform a radial (RBF). I <br/>tend to think that&#39;s because the RBF is more expressive, and it&#39;s <br/>overfitting the noise in the training set.<br/><br/><br/>&gt; Ignoring the fact that SVM is much, much slower to train than NB, it<br/>&gt; still has worse accuracy. What am I doing wrong ?<br/><br/>That may be an accident of your corpus too. Are you using cross- <br/>validation for these experiments? If so, you should be able to get <br/>some error bars to tell whether the difference is statistically <br/>significant or not. I&#39;m guessing a 2% advantage may not be, in this <br/>case.<br/><br/>&gt; I would happily ignore all this and use NB, but it has one major flaw.<br/>&gt; &quot;The winner takes it all&quot;, the first result returned is way too far<br/>&gt; (as in distance :)) from the others, which isn&#39;t exactly accurate if<br/>&gt; one cares of a balanced results pool. I don&#39;t know whether this is an<br/>&gt; implementation problem - I poked around the rescale() function in<br/>&gt; Util.pm with no real success - or a general algorithm problem. My goal<br/>&gt; is to have an implementation that can say: this text is 60% cat X, 20%<br/>&gt; cat Y, 18% cat Z and 2% other cats. Is this feasible ? If so, what<br/>&gt; approach would you recommend (which algorithm, which implementation or<br/>&gt; what path for implementing it ) ?<br/><br/>Unfortunately, neither NB nor SVMs can really tell you that. SVMs <br/>are purely discriminative, so all they can tell you is &quot;I think this <br/>new example is more like class A than class B in my training data&quot;. <br/>There&#39;s no probability involved at all. That said, I believe there <br/>has been some research into how to translate SVM output scores into <br/>probabilities or confidence scores, but I&#39;m not really familiar with it.<br/><br/>NB on the surface would seem to be a better option since it&#39;s <br/>directly based on probabilities, but again the algorithm was designed <br/>only to discriminate, so all those denominators that are thrown away <br/>(the &quot;P(words)&quot; terms in the A::NB documentation) mean that the <br/>notion of probabilities is lost. The rescale() function is basically <br/>just a hack to return scores that are a little more convenient to <br/>work with than the raw output of the algorithm. As you&#39;ve seen, it <br/>tends to be a little arrogant, greatly exaggerating the score for the <br/>first category and giving tiny scores to the rest. I&#39;m sure there <br/>are better algorithms that could be used there, but in many cases <br/>either one doesn&#39;t really care about the actual scores, or one <br/>(*ahem*) does something ad hoc like taking the square root of all the <br/>scores, or the fifth root, or whatever, just to get some numbers that <br/>look better to end users.<br/><br/>As for a better alternative, I&#39;m not familiar with any that will be <br/>as accessible from a perl world, but you might want to look at some <br/>language modeling papers - I really like the LDA papers from Michael <br/>Jordan (no, not that Michael Jordan, this one: http:// <br/>citeseer.ist.psu.edu/541352.html), which are by no means <br/>straightforward, but they will indeed let you describe each document <br/>as generated by a mixture of categories.<br/><br/> -Ken<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg559.html Mon, 08 Jan 2007 04:20:12 +0000 text categorization with SVM and NaiveBayes by zgrim Hello,<br/> I am &quot;playing&quot; with the task of automated text categorization and<br/>inevitably hit a few dilemmas. I have tried different combinations of<br/>SVM and NaiveBayes, here are some results:<br/>- algorithm::svm (single world, through AI::Categorizer) ~ 92%<br/>accuracy (with the linear kernel, the radial one has bellow 10% with<br/>all sorts of values tried for gamma and c)<br/>- algorithm::svmlight ( nr. of categories worlds - each trained<br/>against the others ) ~ 62% in ranking mode<br/>- algorithm::naivebayes (one world, through AI::Categorizer) ~ 94%<br/>- algorithm::naivebayes (each against all other) ~ 73%<br/><br/>These are on the same corpus ( which isn&#39;t perfect at all, but that a<br/>negligible information for now :) ).<br/>By accuracy I mean tested accuracy on a single category, which is, if<br/>the first category returned (highest score) is the supposed one, it&#39;s<br/>a hit, else, a miss.<br/>By single world I mean all categories build a single model, against<br/>tests are run. By multiple worlds (each against all other) I mean each<br/>category builds a model in which the tokens from that category are<br/>positive and the tokens from all other categories are negative.<br/><br/>So, back to my dilemmas. :) The results are puzzling, as many of the<br/>research papers on the subject I&#39;ve consulted say that SVM is<br/>supposedly the best algorithm for this task. The radial kernel should<br/>give the best results, for empirical-found values of gamma and C.<br/>Ignoring the fact that SVM is much, much slower to train than NB, it<br/>still has worse accuracy. What am I doing wrong ?<br/>I would happily ignore all this and use NB, but it has one major flaw.<br/>&quot;The winner takes it all&quot;, the first result returned is way too far<br/>(as in distance :)) from the others, which isn&#39;t exactly accurate if<br/>one cares of a balanced results pool. I don&#39;t know whether this is an<br/>implementation problem - I poked around the rescale() function in<br/>Util.pm with no real success - or a general algorithm problem. My goal<br/>is to have an implementation that can say: this text is 60% cat X, 20%<br/>cat Y, 18% cat Z and 2% other cats. Is this feasible ? If so, what<br/>approach would you recommend (which algorithm, which implementation or<br/>what path for implementing it ) ?<br/>TIA<br/><br/>-- <br/>perl -MLWP::Simple -e&#39;print$_[rand(split(q|%%\n|,<br/>get(q=http://cpan.org/misc/japh=)))]&#39;<br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg558.html Fri, 05 Jan 2007 05:10:40 +0000 Creating Collection of uncategorized data by Alan Gibson Hello,<br/><br/>First post to this list. Im beginning a project that will use<br/>automated text classification to classify congressional bills and<br/>AI::Categorizer looks like the best framework to use. However, Im<br/>hitting a snag on what should be a simple operation.<br/><br/>I train an svm classifier on 1000 documents; this operation goes fine.<br/>I then try to create an instance of AI::Categorizer::Collection::Files<br/>containing 5 unclassified documents. I supply only the path because<br/>the 5 documents are not yet categorized:<br/><br/> my $c = new AI::Categorizer::Collection::Files(<br/> path =&gt; &quot;$path&quot;);<br/> while (my $document = $c-&gt;next) {<br/> my $hypothesis = $nb-&gt;categorize($document);<br/> print &quot;Best assigned category: &quot;, $hypothesis-&gt;best_category, &quot;\n&quot;;<br/> print &quot;All assigned categories: &quot;, join(&#39;, &#39;,<br/>$hypothesis-&gt;categories), &quot;\n&quot;;<br/> }<br/><br/>This produces the error<br/><br/>No category information about &#39;5-508&#39; at<br/>/usr/local/share/perl/5.8.7/AI/Categorizer/Collection/Files.pm line<br/>44.<br/>Mandatory parameter &#39;all_categories&#39; missing in call to<br/>AI::Categorizer::Hypothesis-&gt;new()<br/><br/>To get around this error I could just supply the categories of the 5<br/>unknown test documents, but in our real world application we will have<br/>a constant stream of unclassified documents coming in that will<br/>recieve human attention only long after they have been automatically<br/>classified.<br/><br/>Is the design intent to only allow test documents that already are<br/>categorized (eg for creating confidence statistics)? If so, does<br/>anyone have any suggestions on the preffered way to classifiy unknown<br/>documents with AI::Categorizer?<br/><br/>Thanks,<br/>Alan<br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg557.html Thu, 04 Jan 2007 19:25:49 +0000 Re: ai::categorize samples by Russell Foltz-Smith Thanks, that&#39;s helpful. It&#39;s very difficult to find this corpus, <br/>especially in most of the existing documentation on AI::Categorizer.<br/><br/>Thanks, Dr. Math.<br/><br/>Russ<br/><br/>Ken Williams wrote:<br/>&gt; On Jan 2, 2007, at 8:53 PM, Russell Foltz-Smith wrote:<br/>&gt;<br/>&gt;&gt; Does someone have an examples category text file that works with the<br/>&gt;&gt; demo.pl?<br/>&gt;<br/>&gt; Yup, you can download it from <br/>&gt; http://campstaff.com/~ken/reuters-21578.tar.gz .<br/>&gt;<br/>&gt;<br/>&gt;&gt; Also, does anyone know of an online/web service implementation for web<br/>&gt;&gt; pages/urls to categorize them into dmoz categories?<br/>&gt;<br/>&gt; Not I.<br/>&gt;<br/>&gt; -Ken<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg556.html Thu, 04 Jan 2007 06:39:30 +0000 Re: ai::categorize samples by Ken Williams On Jan 2, 2007, at 8:53 PM, Russell Foltz-Smith wrote:<br/><br/>&gt; Does someone have an examples category text file that works with the<br/>&gt; demo.pl?<br/><br/>Yup, you can download it from http://campstaff.com/~ken/ <br/>reuters-21578.tar.gz .<br/><br/><br/>&gt; Also, does anyone know of an online/web service implementation for web<br/>&gt; pages/urls to categorize them into dmoz categories?<br/><br/>Not I.<br/><br/> -Ken<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg555.html Thu, 04 Jan 2007 04:16:05 +0000 ai::categorize samples by Russell Foltz-Smith Does someone have an examples category text file that works with the <br/>demo.pl?<br/><br/>Also, does anyone know of an online/web service implementation for web <br/>pages/urls to categorize them into dmoz categories?<br/><br/>Russ Smith<br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg554.html Wed, 03 Jan 2007 03:23:38 +0000 ai::categorize samples by Russell Foltz-Smith Does someone have an examples category text file that works with the<br/>demo.pl?<br/><br/>Also, does anyone know of an online/web service implementation for web<br/>pages/urls to categorize them into dmoz categories?<br/><br/>Russ Smith<br/><br/><br/> http://www.nntp.perl.org/group/perl.ai/2007/01/msg553.html Tue, 02 Jan 2007 18:53:20 +0000 Re: AI::Genetic by Gregg Allen Awesome! That helped a lot. I&#39;m looking forward to your new and <br/>improved module.<br/><br/>Thanks for creating that module, in the first place, also. When I <br/>first discovered the module about a year ago, I ran about a dozen <br/>random optimization problems from my graduate level operations <br/>research textbook from 25 years ago.<br/><br/>I didn&#39;t find a single one it couldn&#39;t solve in less than a few <br/>minutes. (Mere seconds in most cases.)<br/><br/>Thanks!<br/><br/>Gregg Allen<br/>Cerebra, Inc.<br/><br/><br/><br/><br/>On Dec 8, 2006, at 8:56 AM, Ala Qumsieh wrote:<br/><br/>&gt;<br/>&gt; --- Benjamin Tucker &lt;ben@greenriver.org&gt; wrote:<br/>&gt;<br/>&gt;&gt; I don&#39;t actually have any experience with<br/>&gt;&gt; AI::Genetic, but<br/>&gt;&gt; Storable.pm is probably your best bet. Take a look<br/>&gt;&gt; at how<br/>&gt;&gt; AI::Categorizer interfaces with it:<br/>&gt;&gt;<br/>&gt; http://search.cpan.org/src/KWILLIAMS/AI-Categorizer-0.07/lib/AI/<br/>&gt;&gt;<br/>&gt;&gt; Categorizer/Storable.pm<br/>&gt;&gt;<br/>&gt;&gt; If you throw something like this into the bottom of<br/>&gt;&gt; one of your perl<br/>&gt;&gt; files, you should be able just to call<br/>&gt;&gt; $gen-&gt;store_state(&#39;filename&#39;) and then<br/>&gt;&gt; $gen-&gt;restore_state<br/>&gt;&gt; (&#39;filename&#39;) (where $gen is an instance of<br/>&gt;&gt; AI::Genetic)<br/>&gt;<br/>&gt; [snip code]<br/>&gt;<br/>&gt; Thanks. That&#39;s an excellent suggestion. I&#39;ll add that<br/>&gt; to AI::Genetic and upload a new version soon.<br/>&gt;<br/>&gt; Thanks,<br/>&gt; --Ala<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt;<br/>&gt; ______________________________________________________________________ <br/>&gt; ______________<br/>&gt; Do you Yahoo!?<br/>&gt; Everyone is raving about the all-new Yahoo! Mail beta.<br/>&gt; http://new.mail.yahoo.com<br/><br/> http://www.nntp.perl.org/group/perl.ai/2006/12/msg552.html Sat, 09 Dec 2006 03:16:17 +0000 Re: AI::Genetic by Ala Qumsieh <br/>--- Benjamin Tucker &lt;ben@greenriver.org&gt; wrote:<br/><br/>&gt; I don&#39;t actually have any experience with<br/>&gt; AI::Genetic, but <br/>&gt; Storable.pm is probably your best bet. Take a look<br/>&gt; at how <br/>&gt; AI::Categorizer interfaces with it:<br/>&gt;<br/>http://search.cpan.org/src/KWILLIAMS/AI-Categorizer-0.07/lib/AI/<br/>&gt; <br/>&gt; Categorizer/Storable.pm<br/>&gt; <br/>&gt; If you throw something like this into the bottom of<br/>&gt; one of your perl <br/>&gt; files, you should be able just to call<br/>&gt; $gen-&gt;store_state(&#39;filename&#39;) and then<br/>&gt; $gen-&gt;restore_state <br/>&gt; (&#39;filename&#39;) (where $gen is an instance of<br/>&gt; AI::Genetic)<br/><br/>[snip code]<br/><br/>Thanks. That&#39;s an excellent suggestion. I&#39;ll add that<br/>to AI::Genetic and upload a new version soon.<br/><br/>Thanks,<br/>--Ala<br/><br/><br/><br/> <br/>____________________________________________________________________________________<br/>Do you Yahoo!?<br/>Everyone is raving about the all-new Yahoo! Mail beta.<br/>http://new.mail.yahoo.com<br/> http://www.nntp.perl.org/group/perl.ai/2006/12/msg551.html Fri, 08 Dec 2006 07:56:29 +0000 Re: AI::Genetic by Benjamin Tucker I don&#39;t actually have any experience with AI::Genetic, but <br/>Storable.pm is probably your best bet. Take a look at how <br/>AI::Categorizer interfaces with it:<br/>http://search.cpan.org/src/KWILLIAMS/AI-Categorizer-0.07/lib/AI/ <br/>Categorizer/Storable.pm<br/><br/>If you throw something like this into the bottom of one of your perl <br/>files, you should be able just to call<br/>$gen-&gt;store_state(&#39;filename&#39;) and then $gen-&gt;restore_state <br/>(&#39;filename&#39;) (where $gen is an instance of AI::Genetic)<br/><br/>package AI::Genetic;<br/><br/>use strict;<br/>use Storable;<br/>use File::Spec ();<br/>use File::Path ();<br/><br/>sub save_state {<br/> my ($self, $path) = @_;<br/> if (-e $path) {<br/> File::Path::rmtree($path) or die &quot;Couldn&#39;t overwrite $path: $!&quot;;<br/> }<br/> mkdir($path, 0777) or die &quot;Can&#39;t create $path: $!&quot;;<br/> Storable::nstore($self, File::Spec-&gt;catfile($path, &#39;self&#39;));<br/>}<br/><br/>sub restore_state {<br/> my ($package, $path) = @_;<br/> return Storable::retrieve(File::Spec-&gt;catfile($path, &#39;self&#39;));<br/>}<br/><br/>1;<br/><br/>Ben<br/><br/>On Dec 8, 2006, at 9:37 AM, Brad Larsen wrote:<br/><br/>&gt; One (possibly stupid) suggestion is to look at Data::Dumper. It <br/>&gt; should work, but may be very slow if the object in question is <br/>&gt; large. Let us know if you find anything better.<br/>&gt;<br/>&gt; Cheers,<br/>&gt; Brad Larsen<br/>&gt;<br/>&gt; greggallen@gmail.com wrote:<br/>&gt;&gt; I know this is going to turn out to be a stupid question, but <br/>&gt;&gt; could someone tell me the easiest way to store and retrieve the <br/>&gt;&gt; state of the entire AI::Genetic colony, and parameters, to a disk <br/>&gt;&gt; file so it can be read in and out at will?<br/>&gt;&gt; I&#39;m doing some constrained optimization experiments that can take <br/>&gt;&gt; several days, even a week, to run in the background, but I have a <br/>&gt;&gt; computer (Mac OS X 10.4.8) that is shared, and I need to install <br/>&gt;&gt; software and restart it almost daily.<br/>&gt;&gt; I would like to save the entire thing about every hour, but I can <br/>&gt;&gt; handle the timing part myself.<br/>&gt;&gt; Sincerely,<br/>&gt;&gt; Gregg Allen<br/>&gt;&gt; Cerebra, Inc.<br/><br/> http://www.nntp.perl.org/group/perl.ai/2006/12/msg550.html Fri, 08 Dec 2006 07:13:17 +0000 Re: AI::Genetic by Brad Larsen One (possibly stupid) suggestion is to look at Data::Dumper. It should <br/>work, but may be very slow if the object in question is large. Let us <br/>know if you find anything better.<br/><br/>Cheers,<br/>Brad Larsen<br/><br/>greggallen@gmail.com wrote:<br/>&gt; <br/>&gt; I know this is going to turn out to be a stupid question, but could <br/>&gt; someone tell me the easiest way to store and retrieve the state of the <br/>&gt; entire AI::Genetic colony, and parameters, to a disk file so it can be <br/>&gt; read in and out at will?<br/>&gt; <br/>&gt; I&#39;m doing some constrained optimization experiments that can take <br/>&gt; several days, even a week, to run in the background, but I have a <br/>&gt; computer (Mac OS X 10.4.8) that is shared, and I need to install <br/>&gt; software and restart it almost daily.<br/>&gt; <br/>&gt; I would like to save the entire thing about every hour, but I can <br/>&gt; handle the timing part myself.<br/>&gt; <br/>&gt; <br/>&gt; Sincerely,<br/>&gt; <br/>&gt; Gregg Allen<br/>&gt; Cerebra, Inc.<br/>&gt; <br/>&gt; <br/>&gt; <br/> http://www.nntp.perl.org/group/perl.ai/2006/12/msg549.html Fri, 08 Dec 2006 06:38:17 +0000 AI::Genetic by greggallen <br/>I know this is going to turn out to be a stupid question, but could <br/>someone tell me the easiest way to store and retrieve the state of <br/>the entire AI::Genetic colony, and parameters, to a disk file so it <br/>can be read in and out at will?<br/><br/>I&#39;m doing some constrained optimization experiments that can take <br/>several days, even a week, to run in the background, but I have a <br/>computer (Mac OS X 10.4.8) that is shared, and I need to install <br/>software and restart it almost daily.<br/><br/>I would like to save the entire thing about every hour, but I can <br/>handle the timing part myself.<br/><br/><br/>Sincerely,<br/><br/>Gregg Allen<br/>Cerebra, Inc.<br/><br/><br/> http://www.nntp.perl.org/group/perl.ai/2006/12/msg548.html Fri, 08 Dec 2006 00:10:45 +0000 Re: [ANNOUNCE] AI::FANN by Salvador "Fandiño" <br/>--- Ovid &lt;publiustemp-perlai@yahoo.com&gt; wrote:<br/><br/>&gt; I think these are a typos:<br/>&gt; <br/>&gt; FANN::AI-&gt;new_standard(@layer_sizes)<br/>&gt; FANN::AI-&gt;new_sparse($connection_rate, @layer_sizes)<br/>&gt; FANN::AI-&gt;new_shortcut(@layer_sizes)<br/>&gt; FANN::AI-&gt;new_from_file($filename)<br/><br/>oh, yes, thank you for pointing them.<br/><br/>Cheers,<br/><br/> - Salva<br/><br/><br/>__________________________________________________<br/>Do You Yahoo!?<br/>Tired of spam? Yahoo! Mail has the best spam protection around <br/>http://mail.yahoo.com <br/> http://www.nntp.perl.org/group/perl.ai/2006/04/msg547.html Fri, 14 Apr 2006 10:51:21 +0000 Re: [ANNOUNCE] AI::FANN by Ovid --- Salvador Fandi&iuml;&iquest;&frac12;o &lt;sfandino@yahoo.com&gt; wrote:<br/><br/>&gt; Hi,<br/>&gt; <br/>&gt; I have uploaded the new AI::FANN module to CPAN:<br/>&gt; <br/>&gt; http://search.cpan.org/~salva/AI-FANN/<br/><br/>Looks great (from the docs, haven&#39;t tested it).<br/><br/>I think these are a typos:<br/><br/> FANN::AI-&gt;new_standard(@layer_sizes)<br/> FANN::AI-&gt;new_sparse($connection_rate, @layer_sizes)<br/> FANN::AI-&gt;new_shortcut(@layer_sizes)<br/> FANN::AI-&gt;new_from_file($filename)<br/><br/>Cheers,<br/>Ovid<br/><br/>-- <br/>If this message is a response to a question on a mailing list, please send follow up questions to the list.<br/><br/>Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/<br/> http://www.nntp.perl.org/group/perl.ai/2006/04/msg546.html Fri, 14 Apr 2006 09:26:47 +0000 [ANNOUNCE] AI::FANN by Salvador Fandiño Hi,<br/><br/>I have uploaded the new AI::FANN module to CPAN:<br/><br/> http://search.cpan.org/~salva/AI-FANN/<br/><br/>It is a wrapper for the Fast Artificial Neural Network library<br/>(http://fann.sf.net):<br/><br/> Fast Artificial Neural Network Library is a free open source<br/> neural network library, which implements multilayer artificial<br/> neural networks in C with support for both fully connected and<br/> sparsely connected networks. Cross-platform execution in both<br/> fixed and floating point are supported. It includes a framework<br/> for easy handling of training data sets. It is easy to use,<br/> versatile, well documented, and fast. PHP, C++, .NET, Python,<br/> Delphi, Octave, Ruby, Pure Data and Mathematica bindings are<br/> available. A reference manual accompanies the library with<br/> examples and recommendations on how to use the library. A<br/> graphical user interface is also available for the library.<br/><br/>This is an early release that may contain critical bugs, though<br/>most things seem to be working properly.<br/><br/>The documentation focus on the differences with the C library, and<br/>both versions should be consulted in order to use the module.<br/><br/>Training an ANN to emulate a XOR gate with AI::FANN looks like<br/>that:<br/><br/> use AI::FANN qw(:all);<br/><br/> # create an ANN with 2 inputs, a hidden layer with 3 neurons<br/> # and an output layer with 1 neuron:<br/> my $ann = AI::FANN-&gt;new_standard(2, 3, 1);<br/><br/> $ann-&gt;hidden_activation_function(FANN_SIGMOID_SYMMETRIC);<br/> $ann-&gt;output_activation_function(FANN_SIGMOID_SYMMETRIC);<br/><br/> # create the training data for a XOR operator:<br/> my $xor_train = AI::FANN::TrainData-&gt;new( [-1, -1], [-1],<br/> [-1, 1], [1],<br/> [1, -1], [1],<br/> [1, 1], [-1] );<br/><br/> $ann-&gt;train_on_data($xor_train, 500000, 1000, 0.001);<br/><br/> $ann-&gt;save(&quot;xor.ann&quot;);<br/><br/><br/>And using the trained ANN:<br/><br/> use AI::FANN;<br/><br/> my $ann = AI::FANN-&gt;new_from_file(&quot;xor.ann&quot;);<br/><br/> for my $a (-1, 1) {<br/> for my $b (-1, 1) {<br/> my $out = $ann-&gt;run([$a, $b]);<br/> printf &quot;xor(%f, %f) = %f\n&quot;, $a, $b, $out-&gt;[0];<br/> }<br/> }<br/><br/><br/><br/>Comments and feedback are very welcome!<br/><br/>Cheers,<br/><br/> - Salva.<br/><br/> http://www.nntp.perl.org/group/perl.ai/2006/04/msg545.html Fri, 14 Apr 2006 03:51:59 +0000 Author of Language::Prolog::Yaswi ? by Steffen Schwigon Hi perl-ai people!<br/><br/>I&#39;m trying to contact the author of Language::Prolog::Yaswi, Salvador<br/>&quot;Fandi&ntilde;o&quot; Garc&iacute;a, but didn&#39;t hear anything from him since september<br/>2005. Does anyone know something about him? <br/><br/>(Greeti+Tha)nX<br/>Steffen <br/>-- <br/>Steffen Schwigon &lt;schwigon@webit.de&gt;<br/>Dresden Perl Mongers &lt;http://dresden-pm.org/&gt;<br/> http://www.nntp.perl.org/group/perl.ai/2006/02/msg544.html Wed, 01 Feb 2006 07:30:12 +0000 New AI::NeuralNet::Simple - feedback welcome by Ovid Hi all,<br/><br/>For those interested in neural nets, there&#39;s a new version of<br/>AI::NeuralNet::Simple out<br/>(http://search.cpan.org/dist/AI-NeuralNet-Simple/). This version<br/>incorporates a good-sized patch from Raphael Manfredi, the author of<br/>Storable.<br/><br/>New features:<br/><br/> Added tanh activation function.<br/> train_set() now accepts a maximum error rate target.<br/> Multiple network support.<br/> Persistence via storable.<br/><br/>AI::NeuralNet::Simple is an easy to use &quot;feed forward, back propogation<br/>neural network.&quot; Since the core of the module is written in C, it&#39;s<br/>very fast. The only significant (known) limitation of the module at<br/>this point is that the number of layers (3) is fixed. However, this is<br/>a very common number of layers for this type of network, so it&#39;s not<br/>too bad.<br/><br/>The docs make loud warnings about the code being alpha and it claims to<br/>be a &quot;simple learning module&quot; but I think it&#39;s solid enough at this<br/>point that it might actually be useful.<br/><br/>Cheers,<br/>Ovid<br/><br/>-- <br/>If this message is a response to a question on a mailing list, please send follow up questions to the list.<br/><br/>Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/<br/> http://www.nntp.perl.org/group/perl.ai/2006/01/msg543.html Mon, 09 Jan 2006 09:01:56 +0000 Yaswi with modules by Steffen Schwigon #! /usr/bin/perl<br/><br/>use strict;<br/>use warnings;<br/><br/>use Language::Prolog::Types::overload;<br/>use Language::Prolog::Yaswi qw(:query :load :context);<br/>use Language::Prolog::Sugar<br/> functors =&gt; { give_me_sth =&gt; &#39;give_me_sth&#39; },<br/> vars =&gt; [qw( Answer )] ;<br/><br/>swi_use_modules ( &quot;./a.swipl&quot;, &quot;./b.swipl&quot; );<br/><br/>sub yaswi_give_me_sth {<br/> local $swi_module = &#39;a&#39;; ### or &#39;b&#39;<br/><br/> # Variante 1<br/> swi_set_query( give_me_sth(2, Answer) );<br/> my $answer = swi_var(Answer) if swi_next;<br/> swi_cut if swi_next;<br/> print &quot;Answer: $answer \n&quot;;<br/>}<br/><br/>yaswi_give_me_sth();<br/><br/> http://www.nntp.perl.org/group/perl.ai/2006/01/msg542.html Mon, 09 Jan 2006 07:30:14 +0000 [Job Posting] Research Scientist, Eagan MN by Ken Williams Hi all,<br/><br/>My company, Thomson Legal and Regulatory (the parent company for West <br/>Publishing, FindLaw, and other legal information services) is looking <br/>for a good Natural Language Processing person. Our R&amp;D group is about <br/>20 or so people, about 10 of whom are Research Scientists (including <br/>me). Since so much of our business is in text data, our entire group <br/>specializes in NLP.<br/><br/>This position is certainly not perl-specific, but as researchers we can <br/>generally choose the tools that we want to use. Personally I tend to <br/>choose perl a lot, but as you&#39;ll note below, we also use Java, C, or <br/>whatever is appropriate to our tasks - sometimes prolog or python or <br/>smaller niche languages. Thus I thought it would be appropriate to <br/>post in this forum.<br/><br/>This position has also been posted on various public job boards, <br/>including Monster.com and our company&#39;s career web site:<br/><br/> http://jobsearch.monster.com/getjob.asp?JobID=37099311<br/> <br/>http://www.thomsoncareercenter.com/search/view_job_xml.asp? <br/>src=rs&amp;jobID=154545&amp;loc=Ext<br/><br/>Note that I am not the hiring manager or an HR person, I&#39;m a fellow <br/>Research Scientist.<br/><br/>Eagan is a suburb of Minneapolis/St. Paul.<br/><br/>Thanks,<br/><br/> -Ken<br/><br/>***********************************************************************<br/>Research Scientist &ndash; Opportunities at Thomson Legal and Regulatory<br/><br/>The Research &amp; Development department for Thomson Legal and Regulatory <br/>would like to invite qualified applicants to apply for an open Research <br/>Scientist position in their Eagan, MN offices. The Research &amp; <br/>Development department performs applied research in natural language <br/>processing, document retrieval, information extraction, text <br/>classification, summarization, and named entity recognition. The ideal <br/>candidate would have significant expertise in one or more of these <br/>research areas.<br/><br/>Preference will be given to candidates with research and work related <br/>experience in the area of named entity recognition and resolution.<br/><br/>Principal duties:<br/>&bull; Conducting applied research in information retrieval, information <br/>extraction, text categorization, text mining, or related areas in the <br/>context of large online delivery environments, such as Westlaw.<br/>&bull; Execution of such projects, including<br/>&bull; implementation of prototypes and the design of experiments to <br/>evaluate them<br/>&bull; performing of experiments to validate key algorithms and <br/>architectures associated with such prototypes, followed by written <br/>reports<br/>&bull; liaison with other departments concerning transition of prototypes <br/>into production<br/>&bull; collaboration with software engineers engaged in the construction of <br/>prototypes.<br/>&bull; Custom development for other TLR departments on key projects with a <br/>research component.<br/>&bull; Monitoring of research literature through reading.<br/><br/>Prerequisites:<br/>&bull; Applicants should have a graduate degree in Computer Science or a <br/>related discipline.<br/>&bull; Relevant experience in one or more of the following areas: natural <br/>language processing, machine learning, information retrieval, <br/>information extraction, document classification, summarization, and <br/>named entity extraction and resolution (Applicants with Masters degree <br/>must have some additional work related experience).<br/>&bull; Substantial experience with UNIX or Windows environment.<br/>&bull; Proficiency in a programming language such as Java or C++.<br/>&bull; Good oral and written communication skills (as demonstrated through <br/>prior technical publications).<br/>&bull; Relevant publications in Journals and (refereed) conferences a plus.<br/><br/>Reports to: Director of Research.<br/>Contact: Khalid Al-Kofahi &lt;Khalid.Al-Kofahi@thomson.com&gt;<br/><br/> http://www.nntp.perl.org/group/perl.ai/2005/12/msg541.html Thu, 15 Dec 2005 10:29:22 +0000 Re: Yaswi question by Steffen Schwigon Hi!<br/><br/>Thanks for all your answers. And of course you are right, when you<br/>say that I just have to benchmark it.<br/><br/>After struggling with some minor problems with Yaswi from mod_perl, I<br/>can now say that the performance impact is indeed quite small when<br/>using Language::Prolog::Yaswi.<br/><br/>I compared<br/> - Language::Prolog::Yaswi from within a mod_perl application,<br/> - the same mod_perl framework without calls ti L::P::Yaswi,<br/> - the pure prolog httpd that is delivered with SWI Prolog.<br/><br/>I used the threaded stress tester that comes with the swi prolog<br/>htttpd and measured time needed for 10000 requests from 8 threads).<br/>(Apache 1.3, mod_perl 1, Athlon 1800+)<br/><br/> - mod_perl+Yaswi takes about 29sec.<br/> - plain mod_perl takes about 26sec.<br/> - httpd from swi takes about 20sec.<br/><br/>As you can see, using Yaswi doesn&#39;t make my application much slower<br/>than it already is.<br/><br/>GreetinX<br/>Steffen <br/>-- <br/>Steffen Schwigon &lt;schwigon@webit.de&gt;<br/>Dresden Perl Mongers &lt;http://dresden-pm.org/&gt;<br/> http://www.nntp.perl.org/group/perl.ai/2005/10/msg540.html Tue, 04 Oct 2005 08:41:44 +0000 Re: Yaswi question by Salvador "Fandiño" <br/><br/>--- Steffen Schwigon &lt;schwigon@webit.de&gt; wrote:<br/><br/>&gt; Hi!<br/>&gt; <br/>&gt; I&#39;m about to intermix a Perl web application with SWI-Prolog.<br/>&gt; Currently Language::Prolog::Yaswi seems to be useful.<br/>&gt; <br/>&gt; The prolog part ought to solve only one particular problem,<br/>&gt; everything else is a mod_perl driven web app.<br/>&gt; <br/>&gt; Now I&#39;m not sure about its performance. I expect about 1 to 3<br/>&gt; requests<br/>&gt; per second at peak times and I don&#39;t know yet how long my prolog<br/>&gt; programm will take.<br/>&gt; <br/>&gt; Does Language::Prolog::Yaswi start a new &quot;pl&quot; process for every<br/>&gt; query? <br/><br/>No. It uses SWI-Prolog as a library, and the queries run inside the<br/>perl process. L::P::Y just converts data structures between perl and<br/>prolog. The overhead is relatively small, though it depends on how<br/>big the data structures passing through the interface are.<br/><br/>Cheers,<br/><br/> - Salvador<br/><br/><br/> <br/>__________________________________ <br/>Yahoo! Mail - PC Magazine Editors&#39; Choice 2005 <br/>http://mail.yahoo.com<br/> http://www.nntp.perl.org/group/perl.ai/2005/09/msg539.html Thu, 29 Sep 2005 08:26:32 +0000 Re: Yaswi question by Ovid --- Steffen Schwigon &lt;schwigon@webit.de&gt; wrote:<br/><br/>&gt; Now I&#39;m not sure about its performance. I expect about 1 to 3<br/>&gt; requests<br/>&gt; per second at peak times and I don&#39;t know yet how long my prolog<br/>&gt; programm will take.<br/>&gt; <br/>&gt; Does Language::Prolog::Yaswi start a new &quot;pl&quot; process for every<br/>&gt; query? <br/>&gt; The README talks about threads, so maybe it already does something<br/>&gt; clever about this.<br/><br/>The reason threads are mentioned is because Perl and SWI-Prolog must<br/>both have threads enabled or both NOT have threads enabled. Threads<br/>are not a requirement. I&#39;m not aware that SWI-Prolog does start<br/>another process but I certainly wouldn&#39;t place a bet on that.<br/><br/>I would strongly recommend Devel::Profiler or something similar on your<br/>code to find out where your true bottlenecks are. If you do find out<br/>your Prolog calls are slow, then you can spend the time tuning those<br/>queries (of course, if it&#39;s just too many Prolog queries in too short<br/>of a time, tuning might not help). On the other hand, if profiling<br/>can get you significant gains in the Perl portion, then you may get a<br/>win just by tuning Perl. If you can&#39;t speed up the Prolog, trying to<br/>replicate it&#39;s functionality in Perl may be difficult. Of course,<br/>there are tricks you can use to push much of the logic into a database<br/>and making multiple calls from Perl. That&#39;s probably slower but a good<br/>RDBMS has optimizations that Prolog often doesn&#39;t so it&#39;s worth<br/>exploring if you&#39;re stuck.<br/><br/>If you really have problems with Prolog, I would suggest issuing a<br/>bunch of short Prolog queries and timing them and then issue some<br/>long-running queries. Time them both from Perl and directly in<br/>SWI-Prolog as if you were running it there. That can give you some<br/>idea as to the performance impact from Perl.<br/><br/>Cheers,<br/>Ovid<br/><br/>-- <br/>If this message is a response to a question on a mailing list, please send<br/>follow up questions to the list.<br/><br/>Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/<br/> http://www.nntp.perl.org/group/perl.ai/2005/09/msg538.html Thu, 29 Sep 2005 08:15:47 +0000 RE: Yaswi question by BUDNEY, DANIEL L If you are worried about the Language::Prolog::Yaswi module being a bottleneck, the proper action is to make up a group of test requests and write a script that calls the L:P:Y module repeatedly for these cases. You can then time the number of &quot;typical&quot; requests that can be handled each minute (and you can even compare the results of a &quot;short&quot; request vs. a &quot;long&quot; request). Run each case 100 or 1000 times in a loop.<br/><br/>For the most meaningful results, you should run the tests on the actual server you are using for the website.<br/><br/>-----Original Message-----<br/>From: Steffen Schwigon [mailto:schwigon@webit.de] <br/>Sent: Thursday, September 29, 2005 4:02 AM<br/>To: perl-ai@perl.org<br/>Subject: Yaswi question<br/><br/>Hi!<br/><br/>I&#39;m about to intermix a Perl web application with SWI-Prolog.<br/>Currently Language::Prolog::Yaswi seems to be useful.<br/><br/>The prolog part ought to solve only one particular problem,<br/>everything else is a mod_perl driven web app.<br/><br/>Now I&#39;m not sure about its performance. I expect about 1 to 3 requests<br/>per second at peak times and I don&#39;t know yet how long my prolog<br/>programm will take.<br/><br/>Does Language::Prolog::Yaswi start a new &quot;pl&quot; process for every query? <br/>The README talks about threads, so maybe it already does something<br/>clever about this.<br/><br/>Or is there another recommended way to set up kind of an &quot;swi prolog<br/>application server&quot; (a process that always runs and answers queries,<br/>eg. via .*-RPC), that&#39;s accessible from Perl?<br/><br/><br/>(Greeti+Tha)nX<br/>Steffen <br/>-- <br/>Steffen Schwigon &lt;schwigon@webit.de&gt;<br/>Dresden Perl Mongers &lt;http://dresden-pm.org/&gt;<br/> http://www.nntp.perl.org/group/perl.ai/2005/09/msg537.html Thu, 29 Sep 2005 07:08:23 +0000