meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pluginto:webpage_scraping [2021/10/23 04:31] gregbalcopluginto:webpage_scraping [2022/06/04 03:52] (current) gregbalco
Line 5: Line 5:
 For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it.  For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. 
  
-Let's say you want the calculator input for [[http://antarctica.ice-d.org/sample/10-MPS-006-COU|a sample called 10-MPS-006-COU]] (which is notable because 12 different nuclide concentration measurements have been made on it). The webpage generated for that sample includes the formatted calculator input. If you are looking at the page in a browser, you can just copy and paste it wherever you want. For convenience when you want a computer program to do it, however, that block of text is delimited in the HTML code by hidden tags like:+Let's say you want the calculator input for [[http://version2.ice-d.org/antarctica/sample/10-MPS-006-COU|a sample called 10-MPS-006-COU]] (which is notable because 12 different nuclide concentration measurements have been made on it). The webpage generated for that sample includes the formatted calculator input. If you are looking at the page in a browser, you can just copy and paste it wherever you want. For convenience when you want a computer program to do it, however, that block of text is delimited in the HTML code by hidden tags like:
  
 <code> <code>
Line 16: Line 16:
  
 % Read a webpage into a string % Read a webpage into a string
-s = webread('http://antarctica.ice-d.org/sample/10-MPS-006-COU');+s = webread('http://version2.ice-d.org/antarctica/sample/10-MPS-006-COU');
  
 % Extract the formatted input data % Extract the formatted input data
Line 48: Line 48:
 which you can then use to calculate exposure ages, or whatever. which you can then use to calculate exposure ages, or whatever.
  
-webpage that contains Cl-36 data will have that as a separate formatted block with tags that look like+Likewise, a webpage that contains Cl-36 data will have the text input data as a separate formatted block with tags that look like:
  
 <code> <code>
-<!-- begin Cl36 --><pre></pre><!-- end Cl36 -->+<!-- begin Cl36 --><pre> ... </pre><!-- end Cl36 -->
 </code> </code>
  
-In a webpage that contains exposure age results, the XML returned by the exposure age calculator is also included in a hidden tag: 
  
-<code> +which you can extract from the HTML string similarly
-<!-- begin_xml_dump <XML GOES HERE> end_xml_dump --> +
-</code> +
- +
-So you can extract it from the HTML string using a similar approach and do something with it+