How to scrape Wikipedia with Automagica Open Source Smart RPA


hello and welcome my name is Thomas and
I am one of the co-founders of Automagica cam in this video I will show you the
basics of browser automation more specifically we’ll be looking up a
specific subject on Wikipedia and transfer the content to a text file on
our desktop browser automation is very extensive and can be done in multiple
ways one way for example is to use the mouse buttons and the keyboard commands
to navigate through web pages this is an easy solution and could be great for
fast prototyping but we prefer to use more robust methods one of these methods
is to find the Xpath of the elements that you want to manipulate an XPath is
a unique identifier for an element and by using these identifiers your
automation becomes way more robust because more robust to self changes such
as for example resolution changes minor changes to the website itself etc so
first let me show you how to actually find the expansion element so we can use
this to our advantage from wikipedia.org and the
website that we’re going to use to extract the data from and right away you
can see the search bar in the middle which is the first element that we want
to identify I also prepared a small notepad which contains the elements that
we are looking for mainly the search bar and later on the Xpath to the contents
or the text on the actual website so first of all were looking for the search
bar how do I find the elements the Xpath for the elements I simply right click
and press inspect this might be different if you’re using a different
browser but the essentials stay the same so I click it and the right-hand side we
can see all the technical identifiers of all the elements and when we hover over
it you can see that the different elements light up so we’re looking for
the search bar the blue element on left should be this one how do we copy the Xpath very simple right click be and copy Xpath I’ll paste it in
notepad for later reference and it actually looks like this it’s a
pretty short short identifier okay so we go on with flow and manually we’ll be
looking for a subject for example robots I type it in and press ENTER and we go
to the webpage of robot on Wikipedia so same as before we’re looking for the
texts to extract so we’re looking for the Xpath of this content I right click it I inspect and let me look for it this should be the one the text is lighting up it’s entirely
blue shall be copying this one copy copy the Xpath
let’s cop face it again for later reference and that should be the Xpath
to the text so we should have everything we need and let’s move on to the actual
automation okay let me go to the portal and the other tab over here for this
video we assume you already got up and running with automatic and you have your
robot installed in this case on a Windows machine if you haven’t done that
whatever I would recommend to check out our installation video first and sorry
we already got up and running so let me just create my first automatic air
script over there and this way we can start
so first give our script a name I would call it Wikipedia I look up as assigned
to about this case my bhavish called c-3po and we’re ready to start so first thing
that needs to happen is we need to open cook chrome so on the left hand side you
can see an activity panel the browser automation and says open the crown
chrome browser so the commands automatically appear and this is all we
need to do to open up a browser next we want to serve to Wikipedia how do we do
this – by entering the URL we click on browse – URL it’s preset to Google we
don’t want to go to Google we want to go to Wikipedia or OGG so let’s rewrite
once we enter a Wikipedia we will land on this page so we want to use the
search bar the next step is to define our search bar so
let’s call it search underscore bar where is it located somewhere in the
browser so it’s in the browser how do we find it by entering the following
commands finds elements by XPath of course because we have been looking for
the XPath luckily we pasted it in our little notepads earlier on so let’s just
copy and paste it and this we define our search bar of course next step is that
we need to enter some keys in the search bar we wouldn’t want to type the subject
that we’re looking for so how do we do this once again we use the search bar
search on the scroll bar and we say send Keys we’re still looking for robots
because why nots and they should type robot in the search bar and this one
last step if you remember correctly we still have to submit this query we still
have to press Enter so once again there’s a building function for this and
we simply address it by saying search bar don’t submit
this should submit our query and pop up the place robot the Wikipedia let’s just
do it manually wanted once more robots we enter submit and next we should land
on the robot page so let’s call this extracts text this command might be a
little longer but the principle is the same
let’s call the text text where is it located somewhere in the browser of
course once again are we going to find it by saying find elements by
XPath you remember correctly we already found the XPath for the for the text so
let’s copy paste it over there we’re not looking for the element itself within
this case we’re looking for the text which is a built in function and we can
address it by typing jobs text so the variable text should contain all the
text of the webpage robots on Wikipedia there’s one last step that we need to do
save the content to a text file as following to use the encoding utf-8 just to be
sure whenever there are silly characters in the Wikipedia text we
will actually write it what are we going to write the text of course and then we
will close it out so that should be it let’s save this one and run it it opens the browser is no should
Wikipedia and this robot really quickly it performs the submit action Wikipedia
is loading the webpage for robots right here and extracting the text ok this
should be it so let’s minimize the browser let’s
close this one it won’t be doing this anymore and let’s open up the text file
let me maximize this and as we can see all the text is saved in the text file I hope this was useful and we wish you
good luck

2 thoughts on “How to scrape Wikipedia with Automagica Open Source Smart RPA

Leave a Reply

Your email address will not be published. Required fields are marked *