meta data for this page
This is an old revision of the document!
Scraping data from ICE-D webpages
This outlines some tricks that can be used to make the web server do some of the work for you.
For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it.
Let's say you want the calculator input for a sample called 10-MPS-006-COU (which is notable because 12 different nuclide concentration measurements have been made on it). The webpage generated for that sample includes the formatted calculator input. If you are looking at the page in a browser, you can just copy and paste it whereever you want. For convenience, however, that block of text is delimited in the HTML code by hidden tags like:
<!– begin v3 –><pre>….<!– end v3 –>
So your script can just look for those in the HTML and pull out what is between them. In MATLAB, for example,
urls = ['http://antarctica.ice-d.org/sample/10-MPS-006-COU' site_name];
s = webread(urls);
l1 = '<!– begin v3 –><pre>';
l2 = '</pre><!– end v3 –>';
v3_input_string = s1);