meta data for this page
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| pluginto:webpage_scraping [2021/10/23 04:30] – gregbalco | pluginto:webpage_scraping [2022/06/04 03:52] (current) – gregbalco | ||
|---|---|---|---|
| Line 5: | Line 5: | ||
| For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. | For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. | ||
| - | Let's say you want the calculator input for [[http://antarctica.ice-d.org/ | + | Let's say you want the calculator input for [[http://version2.ice-d.org/antarctica/ |
| < | < | ||
| Line 16: | Line 16: | ||
| % Read a webpage into a string | % Read a webpage into a string | ||
| - | s = webread(' | + | s = webread(' |
| % Extract the formatted input data | % Extract the formatted input data | ||
| Line 48: | Line 48: | ||
| which you can then use to calculate exposure ages, or whatever. | which you can then use to calculate exposure ages, or whatever. | ||
| - | A webpage that contains Cl-36 data will have that as a separate formatted block with tags that look like | + | Likewise, a webpage that contains Cl-36 data will have the text input data as a separate formatted block with tags that look like: |
| < | < | ||
| - | <!-- begin Cl36 -->< | + | <!-- begin Cl36 -->< |
| </ | </ | ||
| - | In a webpage that contains exposure age results, the XML returned by the exposure age calculator is also included in a hidden tag: | ||
| - | < | + | which you can extract from the HTML string |
| - | <!-- begin_xml_dump <XML GOES HERE> end_xml_dump --> | + | |
| - | </ | + | |
| - | + | ||
| - | So you can extract | + | |