Selenium gotcha – selenium.GetHtmlSource() returns processed HTML

December 29, 2008 by mrdavidlaing
Filed under: .NET, Agile, TDD 

Whilst writing some Selenium based acceptance tests today; I bumped into a hair pulling gotcha.  Hopefully this post will prevent you from the same pain.

The test was to check whether some tracking tag javascript was being inserted into the page correctly or not.

I assumed that I could get the page source as it was being delivered to the browser by calling selenium.GetHtmlSource(); and then check that for the javascript string I was expected.

Unfortunately, GetHtmlSource is just a proxy for the browsers DOM.InnerHTML method; and that returns the Html after it has been preprocessed by the browser.

Turns out that preprocessing does a couple of funky things, including

  • Changing line-endings (Firefox)
  • Changing capitalization (IE6)
  • Seemingly random removal / insertion of ” & ‘  (IE6)

So, when I was expecting a string like this:

[-]?View Code JAVASCRIPT
1
2
3
4
5
6
<script language="javascript" type="text/javascript">
<!--
   var amPid = '206'';
   var amPPid = '4803';
   if (document.location.protocol=='https:')
...[snip]...

IE6 was presenting me with:

[-]?View Code JAVASCRIPT
1
2
3
4
5
6
<SCRIPT language=javascript type=text/javascript>
<!--
   var amPid = '206'';
   var amPPid = '4803';
   if (document.location.protocol=='https:')
...[snip]...

A possible solution is to ignore case, whitespace and quotes when doing the comparison, with a helper method like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
/// <summary>
        /// Use this to compare strings to those returned from selenium.GetHtmlSource for an Internet Explore instance
        /// (IE6 seems to change case and inclusion of quotes, especially for Javascript.?)
        /// </summary>
        /// <param name="expected"></param>
        /// <param name="actual"></param>
        private static void AssertStringContainsIgnoreCaseWhiteSpaceAndQuotes(string expected, string actual)
        {
            string expectedClean = Regex.Replace(expected, @"\s", "").ToLower().Replace("\"","").Replace("'","");
            string actualClean = Regex.Replace(actual, @"\s", "").ToLower().Replace("\"", "").Replace("'", "");
            StringAssert.Contains(expectedClean,actualClean,
                                  string.Format("Expected string \n\n{0} \n\nis not contained within \n\n{1}", expected, actual));
        }

It was the line endings that really floored me; because they were automatically normalized/corrected by my test runner when displaying the error. Aaargh!

Comments

4 Comments on Selenium gotcha – selenium.GetHtmlSource() returns processed HTML

  1. navneet on Fri, 19th Feb 2010 11:10 am
  2. Hi David,
    Thanks for this post.
    I have one issue with text which i am checking through selenium.GetHtmlSource();
    Application scenario:
    -We are doing automation testing for our sites.
    -When we submit page after that on new page we are checking for pixel entries.
    like this ” or tags
    -so i want to detect that using selenium.GetHtmlSource();

    Firefox 3.5.7 and IE 7
    1) Using view source mannualy (right click –> view source)
    Orignal on Firefox :
    Orignal on IE :

    After selenium.GetHtmlSource();
    In Firefox :
    IN IE :

    Upper Lower case and double quote is fine to handle.
    But how to handle attribute sequence on .

    Thanks,
    Navneet

  3. navneet on Mon, 22nd Feb 2010 7:38 am
  4. sorry for previous post it was not contain example which i put in < >

  5. navneet on Mon, 22nd Feb 2010 7:40 am
  6. Firefox 3.5.7 and IE 7
    1) Using view source mannualy (right click –> view source)
    Orignal on Firefox :<surehits account=”167644″ sid=”navneet_12_17_Fast” />
    Orignal on IE :<surehits account=”167644″ sid=”navneet_12_17_Fast” />

    After selenium.GetHtmlSource();
    In Firefox :<surehits account=”167644″ sid=”navneet_12_17_Fast” />
    IN IE :<SUREHITS sid=”navneet_12_17_Fast” account=”167644″ />

  7. mrdavidlaing on Mon, 22nd Feb 2010 1:15 pm
  8. @Naveet,

    Best I can suggest is to do multiple searches – once for tag (SUREHITS), then again for first attribute (sid=”navneet…) and again for next attribute etc.

Tell me what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!





  • Lets talk!

  • Latest del.icio.us links