Thursday, 23 October 2014

Fetching web content

I've been working on fetching web content recently, mostly for building authentication workflows, using the likes of Facebook and Google to act as authentication services.  I'm going to go into detail about how I've done those, not in this post at least, but more simply how I made the requests from my Uniface service form, out to those external services. 

There are a number of different HTTP methods.  Generally speaking, when you are retrieving data only then the method is "GET", whereas if you are sending data that may perform an action as well then the method is "POST".  There are others, but I'm going to stick with these two for now, as they are more commonly used.

In fact, I'm going to start with "GET" first.  

The first thing you need to do is consider the character set.  Your system may be set in different character sets, but you need to make sure the character set of your request matches the service you are calling, in my case it was UTF-8...

  vCharSet = $sys_charset ;backup current setting for later
  $sys_charset "UTF8"   ;set to the character set we need

We then want to create a new instance of the "UHTTP" component - this is the Uniface component that is going to do most of the hard work for us...

  newinstance "UHTTP",vHandle
  if $status < 0 & $procerror < 0 )
    $sys_charset = vCharSet ;restore character set
    return -101             ;error handling!

Once you've got your instance, the next step is to define how the component responds to mismatched or expired certificates, and how it calculates the content length.  By default it will error if there is a certificate error and you have to manually calculate the content length yourself.  It's a binary switch, so to switch them all off we do the following...

  activate vHandle.SET_FLAGS(7)
  if $status < 0 & $procerror < 0 )
    $sys_charset = vCharSet ;restore character set
    return -102             ;error handling!

We're now ready to make the request.  In this example, I'm just going to grab the Google homepage...

  activate vHandle.SEND("","GET","","","",vContent,vResponse)
  if $status < 0 & $procerror < 0 )
    $sys_charset = vCharSet ;restore character set
    return -103             ;error handling!

The parameters are:

  1. URL (string : in) - the URL you're sending the request to.
  2. Method (string : in) - the method being used (this is not checked by "UHTTP").
  3. Username (string : in) - if you're using a secure URL, you may need to populate this.
  4. Password (string : in) - again, you may need to populate this.
  5. Headers (string : inout) - the HTTP request headers in and the response headers out.
  6. Content (string : inout) - in the case of a "GET" this is out only and the page contents.
  7. Response (string : out) - the HTTP response headers.
The $status should be set to 200 for a successful response (to match the HTTP status code for success) or it may be set to 1.  If it is set to 1 then this means that the content was larger than the parameter limit (10Mb) and therefore you will need to get the rest of the content from the buffer, like this...

  while $status = 1 )
    activate vHandle.READ_CONTENT(vExtra)
    if $status < 0 & $procerror < 0 )
      $sys_charset = vCharSet ;restore character set
      return -104             ;error handling!
    vContent = "%%vContent%%vExtra%%%"

You now have the full page contents, but don't forget to restore the character set...

  $sys_charset = vCharSet ;restore character set

Doing a "POST" is very similar, except for a couple of differences:
  1. To replicate a web browser "POST", which you are usually doing, you need to make sure the 5th parameter of the SEND call is populated with the following header.. "Content-Type=application/x-www-form-urlencoded"
  2. The 6th parameter of the SEND call needs to be populated with the data that you are sending in your request.  If this is larger than 10Mb then you will need to have a WRITE_CONTENT loop (similar to the READ_CONTENT loop) before the SEND, in order to populate the buffer with the full contents.
This all works rather well.  However, I've found two major limitations along the way.

Firstly, it was not possible to send a request to a URL which had a colon (:) character after the protocol.  For example, when trying to get a user's LinkedIn ID you would use the following URL...


The colon (:) character after "/people/~" broke the URL.  Luckily I'm using the past tense here, this was fixed in patch X504 (for Uniface 9.6.05).  This works fine now we've installed the patch.

Secondly, the observant of you may have noticed the 6th parameter of the SEND call is defined as a string.  This is great for grabbing HTML page source from a website, or even doing most webservice calls, as these tend to return either XML and JSON strings of data.  However, it makes creating something like a Dropbox interface impossible, or at least very limited.  You can grab text files, but nothing else!  No images, no Office documents, no PDFs.

Unfortunately there's no word from Uniface on when this second issue might be resolved.

Summary: It's possible to use the "UHTTP" component to request text data, either from a web page or a webservice.  Just don't expect it to work with binary files, yet!

No comments:

Post a Comment