HTML Engine

The HTML Engine is a powerful and easy class for parsing HTML documents. The HTML parser will get all the links, emails and images in a page and put them into lists. Will also dissect HTML forms.

The HTML parser understands many anchor links like A tags, frame, link, etc. This is very nice when working with a crawler.

A simple example of using the HTML parser:

ireq = {} # Create input dict
oreq = {} # Create output dict
http_init(ireq) # Init HTTP
http_do_request(ireq,oreq) # Do HTTP request
h = WalParser() # Call HTML parser class
h.feed(oreq['wal']['data']) # Start parsing HTML page
print h.GetMails() # Print all the mails found
h.close() # Close HTML parser


This example will print all the emails found in a web page.

Functions List


GetMails

Function Name: def GetMails()
Parameters: Nothing
Return: List
Abstract: Returns all the mails found.


Home | Top
GetTitle

Function Name: def GetTitle()
Parameters: Nothing
Return: String
Abstract: Returns HTML page title.


Home | Top
GetMeta

Function Name: def GetMeta()
Parameters: Nothing
Return: List
Abstract: Returns a list of meta.


Home | Top
GetLinks

Function Name: def GetLinks()
Parameters: Nothing
Return: List
Abstract: Returns a list of links.


Home | Top
GetImg

Function Name: def GetImg()
Parameters: Nothing
Return: List
Abstract: Returns a list of images.


Home | Top
GetForm

Function Name: def GetForm()
Parameters: Nothing
Return: Dict
Abstract: Returns a dict of the parser form.


Home | Top
GetFormInput

Function Name: def GetFormInput()
Parameters: Nothing
Return: List
Abstract: Returns a list of input tags of the form.


Home | Top
GetFormSelect

Function Name: def GetFormSelect()
Parameters: Nothing
Return: List
Abstract: Returns a list of select tags of the form.


Home | Top
GetFormOption

Function Name: def GetFormOption()
Parameters: Nothing
Return: List
Abstract: Returns a list of option tags of the form.


Home | Top
feed

Function Name: def feed(str)
Parameters: String
Return: Nothing
Abstract: Begins parsing data.


Home | Top
close

Function Name: def close()
Parameters: Nothing
Return: Nothing
Abstract: Closes the parser.


Home | Top
wreset

Function Name: def wreset()
Parameters: Nothing
Return: Nothing
Abstract: Resets values inside HTML parser.


Home | Top
Roses Labs Innovations (RL+I)
Roses Labs © 2004