The HTML Engine is a powerful and easy class for parsing HTML documents. The HTML parser will get all the links, emails and images in a page and put them into lists. Will also dissect HTML forms.
The HTML parser understands many anchor links like A tags, frame, link, etc. This is very nice when working with a crawler.
A simple example of using the HTML parser:
ireq = {} # Create input dict
oreq = {} # Create output dict
http_init(ireq) # Init HTTP
http_do_request(ireq,oreq) # Do HTTP request
h = WalParser() # Call HTML parser class
h.feed(oreq['wal']['data']) # Start parsing HTML page
print h.GetMails() # Print all the mails found
h.close() # Close HTML parser
Functions List
Function Name: def GetMails()
Parameters: Nothing
Return: List
Abstract: Returns all the mails found.
Function Name: def GetTitle()
Parameters: Nothing
Return: String
Abstract: Returns HTML page title.
Function Name: def GetMeta()
Parameters: Nothing
Return: List
Abstract: Returns a list of meta.
Function Name: def GetLinks()
Parameters: Nothing
Return: List
Abstract: Returns a list of links.
Function Name: def GetImg()
Parameters: Nothing
Return: List
Abstract: Returns a list of images.
Function Name: def GetForm()
Parameters: Nothing
Return: Dict
Abstract: Returns a dict of the parser form.
Function Name: def GetFormInput()
Parameters: Nothing
Return: List
Abstract: Returns a list of input tags of the form.
Function Name: def GetFormSelect()
Parameters: Nothing
Return: List
Abstract: Returns a list of select tags of the form.
Function Name: def GetFormOption()
Parameters: Nothing
Return: List
Abstract: Returns a list of option tags of the form.
Function Name: def feed(str)
Parameters: String
Return: Nothing
Abstract: Begins parsing data.
Function Name: def close()
Parameters: Nothing
Return: Nothing
Abstract: Closes the parser.
Function Name: def wreset()
Parameters: Nothing
Return: Nothing
Abstract: Resets values inside HTML parser.