Parsing response to get url and global html search

Hi There,

  • I am trying to parse following html using find but unable to get the url www.listing.com/ 1

  • Also trying to find if the entire html contains word “North sydney” even once to put if, else condition. but it is always visiting if block (true condition).

Any help is appreciated. Thank you!

<div class="listing-results_grid_content">
      <section class="listing-results_list grid grid--3 panels_item panels__item--show" id="panel-all_listings">
    <article class="listings__item" data-link="https://www.listing.com/1">
      <h4 class="heading--semantic">listing item</h4>
      <figure class="listings_item_thumbnail">
          <a href=" https://www.listing.com/1" class="listings__fav img-icon--favourite-alt"></a>
      </figure>
      <section class="listings_item_overview">
          <a href=" https://www.listing.com/1"><h3 class="listings_item_address">1 William street</h3></a>
          <p class="listings_item_suburb">North Sydney, NSW</p>
          <p class="listings_item_status">Just Listed</p>
          <p class="listings_item_price">
              <span class="listings_item_label">$1,000,000</span>
          </p>
              </section>
    </article>
  
    <article class="listings__item" data-link="https://www.listing.com/2">
      <h4 class="heading--semantic">listing item</h4>
      <figure class="listings_item_thumbnail">
          <a href=" https://www.listing.com/2" class="listings__fav img-icon--favourite-alt"></a>
      </figure>
      <section class="listings_item_overview">
          <a href=" https://www.listing.com/2h3 class="listings_item_address">1 Main street </h3></a>
          <p class="listings_item_suburb">North Sydney, NSW</p>
          <p class="listings_item_status"></p>
          <p class="listings_item_price">
              <span class="listings_item_label">$1,500,000</span>
          </p>
         </section>
    </article>

Tried:

  1. const url = doc.find("article").attr("data-link"); returns undefined
  2. It keeps returning undefined as it’s searching inside from the response and not reading any data. If I try findbetween , it returns {{linkUrl}} instead of listing.com is for sale | www.oxley.com

Hi @shri, Welcome to the community forum!

From me trying locally it seems to work just as advertised

import { parseHTML } from "k6/html";

const src = `<div class="listing-results_grid_content">
      <section class="listing-results_list grid grid--3 panels_item panels__item--show" id="panel-all_listings">
    <article class="listings__item" data-link="https://www.listing.com/1">
      <h4 class="heading--semantic">listing item</h4>
      <figure class="listings_item_thumbnail">
          <a href=" https://www.listing.com/1" class="listings__fav img-icon--favourite-alt"></a>
      </figure>
      <section class="listings_item_overview">
          <a href=" https://www.listing.com/1"><h3 class="listings_item_address">1 William street</h3></a>
          <p class="listings_item_suburb">North Sydney, NSW</p>
          <p class="listings_item_status">Just Listed</p>
          <p class="listings_item_price">
              <span class="listings_item_label">$1,000,000</span>
          </p>
              </section>
    </article>

    <article class="listings__item" data-link="https://www.listing.com/2">
      <h4 class="heading--semantic">listing item</h4>
      <figure class="listings_item_thumbnail">
          <a href=" https://www.listing.com/2" class="listings__fav img-icon--favourite-alt"></a>
      </figure>
      <section class="listings_item_overview">
          <a href=" https://www.listing.com/2h3 class="listings_item_address">1 Main street </h3></a>
          <p class="listings_item_suburb">North Sydney, NSW</p>
          <p class="listings_item_status"></p>
          <p class="listings_item_price">
              <span class="listings_item_label">$1,500,000</span>
          </p>
         </section>
    </article>`

export default function() {
    let doc = parseHTML(src);
    let article = doc.find("article");
    console.log(article.attr("data-link"));
    article = article.next()
    console.log(article.attr("data-link"));
    article = article.next()
    console.log(article.attr("data-link"));
}

Will get you

INFO[0000] https://www.listing.com/1                     source=console
INFO[0000] https://www.listing.com/2                     source=console
INFO[0000] undefined                                     source=console

Are you certain this is the source you are starting with?

1 Like

Thanks @mstoykov I was expecting the same response but apparently the html source code doesn’t show any dynamic data returned from the API and only has JS code which is being searched and returned results are undefined/null. I am exploring xk6 atm to see if that problem can be tackled but came across new issue.

From Home page when I search for a listing and expected to navigate to search result page I get

ERRO[0030] err:read tcp 127.0.0.1:50057->127.0.0.1:50056: wsarecv: An existing connection was forcibly closed by the remote host. category="Connection:handleIOError" elapsed="0 ms" goroutine=39 ERRO[0030] clicking on element: context canceled

and unable to perform any actions to search result page. The domain of Home and Search result are same.

From what you are saying this seems like your page loads the search results through JavaScript and updates the code. But k6 does not execute JavaScript from a webpage it just http.get-ed. You arguably have two choices:

  1. Do what the JS does to get the results and parse them from there - I would expect they come back in a JSON format, so you can just get them
  2. use xk6-browser to act as a browser (as it is) and execute the page source. This won’t scale very well as you need to run a full browser though.

It seems you went with 2, which is fine, but again - this will use a lot of system resources, so you might need to go back to 1 if you will want to scale this up.