meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
pluginto:webpage_scraping [2021/10/23 04:31] – gregbalco | pluginto:webpage_scraping [2022/06/04 03:52] (current) – gregbalco | ||
---|---|---|---|
Line 5: | Line 5: | ||
For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. | For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. | ||
- | Let's say you want the calculator input for [[http://antarctica.ice-d.org/ | + | Let's say you want the calculator input for [[http://version2.ice-d.org/antarctica/ |
< | < | ||
Line 16: | Line 16: | ||
% Read a webpage into a string | % Read a webpage into a string | ||
- | s = webread(' | + | s = webread(' |
% Extract the formatted input data | % Extract the formatted input data | ||
Line 48: | Line 48: | ||
which you can then use to calculate exposure ages, or whatever. | which you can then use to calculate exposure ages, or whatever. | ||
- | A webpage that contains Cl-36 data will have that as a separate formatted block with tags that look like | + | Likewise, a webpage that contains Cl-36 data will have the text input data as a separate formatted block with tags that look like: |
< | < | ||
- | <!-- begin Cl36 -->< | + | <!-- begin Cl36 -->< |
</ | </ | ||
- | In a webpage that contains exposure age results, the XML returned by the exposure age calculator is also included in a hidden tag: | ||
- | < | + | which you can extract from the HTML string |
- | <!-- begin_xml_dump <XML GOES HERE> end_xml_dump --> | + | |
- | </ | + | |
- | + | ||
- | So you can extract | + | |