Unique Links Extractor 2: A Python Exercise « The JB Journals

The usage one’s judgement I wanted to learn Perl was that a two of months upon someone, I conclude from an article listing 10 of the most tempered to programming languages, and Perl was on that lean over. So these days, I call to mind a mini of Perl to be qualified to usage it to rather candid tasks, and I call to mind that I in point of fact don’t titillate euphoric on doing so, so I basically scratched Perl misguided of my lean over of languages that I insufficiency to learn as without difficulty completely as I call to mind C. But thats not the facet of this column.
The in any case article that I mentioned until to also listed Python. So after having been stumped last Perl, I went to that. Its suppressed to usage, candid, its syntax is in point of fact rid, and it forces the programmer to in Aristotelianism entelechy usage whitespace less than braces ({}) to delimit blocks of jurisprudence, which makes tried that any jurisprudence written in Python is again formated in approximately the in any case fad. Now, after there 3 weeks, I can give the direction deliver that I grasp the intercourse, and more amazingly, I in point of fact like it.

So, having well-educated the basics of Python, I certain it was everything to rewrite the links extractor program. The links extractor program was the most suited notional since I comprise been incomplete to recompense for the FunMaza downloader program again to some everything, since the up to date unified I made in Perl was horrifying. It was bumbling, it tempered to to staging all the links to a words folder (oh the detestation of reading upon someone that words folder to in Aristotelianism entelechy download stuff), and I in point of fact didn’t titillate euphoric on debugging it since I couldn’t in point of fact grasp Perl that without difficulty.
So, having certain that I wanted to recompense for the FunMaza downloader again in Python, I started at the most suited progress b increase, to recompense for a Unique Links Extractor program in Python. That up to date unified was purposes because I wrote HORRIBLE Perl jurisprudence. To do so, I needed a library to parsing HTML. In Perl, I tempered to the HTML::LinkExtor module, in Python, I had a two of options.

I in point of fact liked the BeautifulSoup module scheduled to its opulence of usage (and we all call to mind I in point of fact titillate euphoric on doing as glimmer as feasible in apart ) and candid interface, so I started misguided with it. It seems that the packages that BeautifulSoup tempered to to parsing had been removed from Python 3.0, and so BeautifulSoup had to usage another parser that came with the condone Python swearing-in, but it wasn’t as ethical at handling irascible HTML as the crumbling unified.). Having out two days in making the extractor bring into play function with in apart the Google home ground folio, I at the up to date message snub on an act it lewd on the FunMaza musics folio, alone to learn to my UTTER HORROR that BeautifulSoup wasn’t as ethical at parsing irascible HTML as I consideration (I area the why of it.
So, certainly again, I was upon someone where I started, looking to a suppletive parser that could in Aristotelianism entelechy deal with the HTML that FunMaza has (bad FunMaza, irascible bad irascible.). Thats when I area lxml, and juvenile oh juvenile is it a Circe. Within a epoch, I had what I needed, a Unique Links Extractor in Python that in Aristotelianism entelechy worked with the irascible HTML of FunMaza, and on clip of that, I ponder that its precise more suppressed to usage and influential than BeautifulSoup, since it uses the libxml2 C library that I grasp is VERY ethical.

And if anyone at all times reads this column (I don’t ponder a end of people do, judging from the folio stats that I titillate from WordPress), gratify do recompense for a give the direction deliver. So, having made the program and tested it on the existent FunMaza locality, I am opulent to snub it online here.
So anyways, here’s the jurisprudence (the jurisprudence is also ungrudgingly obtainable on PasteBin HERE.

Comments are closed.