How to make webscraping great again with Automagica Open Source Smart RPA


hello and welcome my name is Thomas and
I am one of the co-founders of Automagica in this video we will build a small
project where we systematically count how many times Donald Trump is mentioned
on the New York Times front page you could transform this to your own needs as a variety of applications for example monitoring if your company appears on a
certain website or for doing marketing analysis of public webpages so to start
off let’s open New York Times web page the easiest way to extract the text is
to find it Xpath an XPath is a unique identifier for an element on a webpage
so in order to find the egg text pop will use Google Chrome right click on
random elements and click on inspect this will open up all the elements that
are available on the web page so from this it’s a bit of trial and error
reward we’re looking for the element that contains all of this most of the
text so when we hover over the elements the different areas turn blue which
means they are selected so for example this could be a good fit so we do right
click copy and then copy the XPath okay so it’s time to start the
automation so for this video we assume you already
got up and running with Automgica and you have your robot installed in this
case on a Windows machine if you haven’t done that I would recommend to check out
our installation video first so on this machine I already have installed my
robot which is called c-3po and some stop on this very machine so let’s go
ahead and create our first automatic the scripts let’s give the name trim counter
assign it to a robot this case I only have one c-3po
you can go adds so first step that we need to do is open the browser so left
hand side we clicked browser and we che open Chrome browser next step is to
serve to the New York Times website and we can do it as following next we need to extract the text from
the web site with the XPath that we just found we say browser and within the browser
we’re going to look for the XPath by doing I still have it in my clipboard so
I can just paste it here there we go and a one last stop step we don’t need the
element that we need text inside the elements so we extracted by typing dot
text so right now we have the body which
contains most of the text from the front page but you don’t need the text we want
to count how many trying times Trump is mentioned within this body so
how do we do this let’s start out by defining a variable the Trump counter let’s take the body and count what are
we going to dump count we’re going to count the word from so one last stop we
want to show the counts we want to see how many times Trump was mentioned we
could do this in several ways we could write it to it text file we could send
an email we could dry it to an excel file in this case I’m just going to keep
it simple we’re going to display it in a popup message box left hand sides we
click message box I say a display info box the title I’m going to change it to
run counter and the body can just contain the amount of shrimp license that should be it
so let’s save the script and let’s read it the new york times
side opens textures extracted and there we have it we have a pop up
today Trump appears ten times on the New York Times front page so I hope this
helps and good luck with your automations

3 thoughts on “How to make webscraping great again with Automagica Open Source Smart RPA

Leave a Reply

Your email address will not be published. Required fields are marked *