Skip to content Skip to sidebar Skip to footer

Scraping Javascript Data Within A Grid Of A Webpage Using Selenium And Python

My issue is that I need all the data within the grid containing subdomains from the website https://applipedia.paloaltonetworks.com - (data containing NAME , CATEGORY, SUBCATEGORY,

Solution 1:

As per the url https://applipedia.paloaltonetworks.com/ to get the list of all apps having subdomains you need to induce WebDriverWait for the desired elements to be visible and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-gpu")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get('https://applipedia.paloaltonetworks.com/')
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
    for element in elements:
        print(element.get_attribute("innerHTML"))
    
  • Console Output:

    DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6
    
                                    2ch
                                    51.com
                                    adobe-connect
                                    adobe-connectnow
                                    adobe-creative-cloud
                                    aim
                                    aim-express
                                    ali-wangwang
                                    amazon-cloud-drive
                                    amazon-music
                                    ameba-now
                                    assembla
                                    autodesk360
                                    avaya-webalive
                                    bacnet
                                    baidu-hi
                                    bebo
                                    bitbucket
                                    boxnet
                                    buddybuddy
                                    chinaren
                                    cisco-spark
                                    cloudapp
                                    cloudforge
                                    cloudinary
                                    concur
                                    confluence
                                    convo
                                    cyph
                                    daum
                                    dcinside
                                    diameter
                                    dnp3
                                    dochub
                                    docstoc
                                    docusign
                                    draw.io
                                    dropbox
                                    egnyte
                                    evernote
                                    facebook
                                    fetion
                                    filestack
                                    flickr
                                    flixwagon
                                    fuze-meeting
                                    gatherplace
                                    genesys
                                    git
                                    github
                                    gitlab
                                    glassdoor
                                    globalmeet
                                    gmail
                                    google-calendar
                                    google-cloud-storage
                                    google-docs
                                    google-hangouts
                                    google-plus
                                    google-spaces
                                    google-talk
                                    google-translate
                                    google-video
                                    gotomypc
                                    gotowebinar
                                    gtp
                                    hadoop
                                    hightail
                                    hipchat
                                    hootsuite
                                    huddle
                                    hulu
                                    hyves
                                    iccp
                                    icloud
                                    iec-60870-5-104
                                    imeet
                                    imgur
                                    instagram
                                    instan-t
                                    ip-messenger
                                    ipsec
                                    irc
                                    issuu
                                    itunes
                                    jira
                                    join-me
                                    jumpshare
                                    kaixin
                                    kaixin001
                                    kakaotalk
                                    laiwang
                                    landesk
                                    linkedin
                                    live-mesh
                                    lotus-notes
                                    lotuslive
                                    lucidpress
                                    mail.ru
                                    mail.ru-agent
                                    maytech
                                    meebo
                                    meetup
                                    mega
                                    mendeley
                                    mercurial
                                    mixi
                                    modbus
                                    ms-ds-smb
                                    ms-lync
                                    ms-office365
                                    ms-onedrive
                                    msn
                                    myspace
                                    nateon-im
                                    netease-webdisk
                                    netflix
                                    ning
                                    noteworthy
                                    now-tv
                                    odnoklassniki
                                    onehub
                                    owncloud
                                    paltalk
                                    pastebin
                                    pcanywhere
                                    pinterest
                                    pivotaltracker
                                    powow
                                    prezi
                                    proofhub
                                    qik
                                    qliksense-cloud
                                    qq
                                    quip
                                    quora
                                    rally-software
                                    readytalk
                                    reddit
                                    rediffbol
                                    renren
                                    rtp
                                    salesforce
                                    sap-jam
                                    screencast
                                    scribd
                                    second-life
                                    secure-data-space
                                    sendthisfile
                                    service-now
                                    sharefile
                                    sharepoint
                                    sharevault
                                    showmax
                                    siemens-s7
                                    signiant
                                    sina-uc
                                    sina-weibo
                                    skydrive
                                    slack
                                    slideshare
                                    smartsheet
                                    snmp
                                    softros-messenger
                                    solarwinds
                                    soundcloud
                                    sourceforge
                                    spark-im
                                    ss7-map
                                    stocktwits
                                    storify
                                    subversion
                                    surveymonkey
                                    syncplicity
                                    tableau
                                    teamdrive
                                    teamup-calendar
                                    teamviewer
                                    thwapr
                                    torch-browser
                                    trello
                                    tumblr
                                    twitter
                                    uc-yun
                                    viber
                                    vimeo
                                    vine
                                    virustotal
                                    vkontakte
                                    vnc
                                    watchdox
                                    webex
                                    wechat
                                    weiyun
                                    whatsapp
                                    windows-azure
                                    windows-defender-atp
                                    workday
                                    yahoo-im
                                    yammer
                                    youku
                                    yousendit
                                    youtube
                                    yunpan360
                                    yy-voice
                                    zalo
                                    zendesk
                                    zenefits
                                    zettahost
    

Solution 2:

With code below you can get list of domains with subdomains fast and clear:

WebDriverWait(driver, 20).until(EC. visibility_of_element_located((By.CSS_SELECTOR, "[ottawagroup='1'] a")))
domains = driver.execute_script("return  [...document.querySelectorAll(\"[ottawagroup='1'] a\")].map(e=>e.textContent.trim())")

Post a Comment for "Scraping Javascript Data Within A Grid Of A Webpage Using Selenium And Python"