meta data for this page
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
pluginto:webpage_scraping [2021/10/23 04:19] – created gregbalco | pluginto:webpage_scraping [2022/06/04 03:52] (current) – gregbalco | ||
---|---|---|---|
Line 5: | Line 5: | ||
For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. | For example, instead of having your program query the database for raw data and then assemble it into calculator input format, you can often have the web server do it. | ||
- | Let's say you want the calculator input for [[http://antarctica.ice-d.org/ | + | Let's say you want the calculator input for [[http://version2.ice-d.org/antarctica/ |
- | '' | + | < |
+ | <!-- begin v3 -->< | ||
+ | </ | ||
So your script can just look for those in the HTML and pull out what is between them. In MATLAB, for example, | So your script can just look for those in the HTML and pull out what is between them. In MATLAB, for example, | ||
+ | < | ||
- | '' | + | % Read a webpage into a string |
- | s = webread(urls); | + | s = webread(' |
+ | % Extract the formatted input data | ||
l1 = '< | l1 = '< | ||
l2 = '</ | l2 = '</ | ||
- | v3_input_string = s((strfind(s, | + | v3_input_string = s((strfind(s, |
+ | |||
+ | </ | ||
+ | |||
+ | That should produce this result: | ||
+ | |||
+ | < | ||
+ | v3_input_string = | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | ' | ||
+ | </ | ||
+ | |||
+ | which you can then use to calculate exposure ages, or whatever. | ||
+ | |||
+ | Likewise, a webpage that contains Cl-36 data will have the text input data as a separate formatted block with tags that look like: | ||
+ | |||
+ | < | ||
+ | <!-- begin Cl36 -->< | ||
+ | </ | ||
+ | |||
+ | |||
+ | which you can extract from the HTML string similarly. | ||
+ | |||
+ | |||