meta data for this page
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
pluginto:useful_sql_queries [2023/10/19 21:29] – joetulenko | pluginto:useful_sql_queries [2024/12/03 17:29] (current) – joetulenko | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===Notes from Joe:=== | + | ===SQL Queries - suggested structure, cautions, and examples=== |
+ | |||
+ | Find here some suggestions on how to structure your queries, some things to be aware of when making queries, and some examples of useful queries you can directly copy and paste into your SQL program that should work in ICE-D. | ||
+ | |||
+ | \\ | ||
+ | |||
+ | ==suggested query structure | ||
In my approach to writing SQL queries, I like to think of the structure of an SQL query in the following steps: | In my approach to writing SQL queries, I like to think of the structure of an SQL query in the following steps: | ||
Line 27: | Line 33: | ||
---- | ---- | ||
- | 3. JOIN: In the event you want information from multiple tables (e.g., you want information from the base_sample and base_site and base_publication tables), you need to specify how these tables are linked together, in other words which fields in which tables are used to link tables together. In ICE-D, these fields (attributes) are all going to be ID fields. There are other ways to link tables, but brief documentation reading suggests this is the most computationally efficient path. The syntax is: JOIN //target table// ON //target table.identifyer// = //starting table.identifyer// | + | 3. JOIN: In the event you want information from multiple tables (e.g., you want information from the base_sample and base_site and base_publication tables), you need to specify how these tables are linked together, in other words which fields in which tables are used to link tables together. In ICE-D, these fields (attributes) are all going to be ID fields. There are other ways to link tables, but brief documentation reading suggests this is the most computationally efficient path. The syntax is: JOIN //target table// ON //target table.identifier// = //starting table.identifier// |
example if I want information from example above, I need to connect to the base_site table and the base_publication table. This also means, since we have a two-way relation between samples and publications, | example if I want information from example above, I need to connect to the base_site table and the base_publication table. This also means, since we have a two-way relation between samples and publications, | ||
Line 74: | Line 80: | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
+ | \\ | ||
==A couple cautions:== | ==A couple cautions:== | ||
**1. Beware of replicate samples from queries!**\\ | **1. Beware of replicate samples from queries!**\\ | ||
- | Certain tables are linked to others through a two-way relation, or a ' | + | Certain tables are linked to others through a two-way relation, or a ' |
For example, **samples** and **publications** are linked through a two-way relation because it is possible for a publication to have many samples (it better, right? It would hopefully be tough to publish a paper with only ONE exposure age reported in it), but it is also possible for one sample to be referenced in multiple publications (perhaps a research group ran a sample back in the day, but recently re-ran another aliquot of the same exact sample and discussed that sample in a new publication). | For example, **samples** and **publications** are linked through a two-way relation because it is possible for a publication to have many samples (it better, right? It would hopefully be tough to publish a paper with only ONE exposure age reported in it), but it is also possible for one sample to be referenced in multiple publications (perhaps a research group ran a sample back in the day, but recently re-ran another aliquot of the same exact sample and discussed that sample in a new publication). | ||
- | Thus, IF YOU DO NOT WANT TO SELECT INFORMATION FROM A SAMPLE MORE THAN ONCE and it shows up in multiple publications, | + | Thus, IF YOU DO NOT WANT TO SELECT INFORMATION FROM A SAMPLE MORE THAN ONCE and it shows up in multiple publications, |
\\ | \\ | ||
Line 95: | Line 104: | ||
---- | ---- | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
+ | \\ | ||
===Example Queries=== | ===Example Queries=== | ||
Line 111: | Line 123: | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
2. Select necessary attribute information formatted in the v3 exposure age calculator input for 10Be samples in a specific site NOT in Antarctica (if in Antarctica, just change the ' | 2. Select necessary attribute information formatted in the v3 exposure age calculator input for 10Be samples in a specific site NOT in Antarctica (if in Antarctica, just change the ' | ||
Line 144: | Line 158: | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
3. Select/list each 14C extraction lab currently represented in the database.\\ | 3. Select/list each 14C extraction lab currently represented in the database.\\ | ||
Line 158: | Line 174: | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
4. Select sample/ | 4. Select sample/ | ||
Line 199: | Line 217: | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
5. Select samples only within a specific application.\\ | 5. Select samples only within a specific application.\\ | ||
- | **New Feature:** not really a new concept, but just demonstrating how **sites** are linked to **applications** and how you can isolate sites/ | + | **New Feature:** not really a new concept, but just demonstrating how **sites** are linked to **applications** and how you can isolate sites/ |
< | < | ||
Line 217: | Line 237: | ||
---- | ---- | ||
+ | |||
+ | \\ | ||
6. Selecting samples where the site type (ie base_site.what) is // | 6. Selecting samples where the site type (ie base_site.what) is // | ||
Line 235: | Line 257: | ||
\\ | \\ | ||
- | If you want an EXACT match, omit the %'s inside the parentheses. Although this statement is still not case sensitive, just FYI.\\ | + | If you want an EXACT match, omit the %'s inside the parentheses. Although this statement is still not case sensitive, just FYI.// |
+ | |||
+ | ---- | ||
+ | |||
+ | \\ | ||
+ | |||
+ | 7. Filtering selection by choosing samples from sites that have AT LEAST 3 samples per site. For example, let's say I only trust moraine ages in the database that are based on at least 3 boulder ages on the moraine, any less and the age might be uncertain.\\ | ||
+ | **New Feature:** lots of new features in this one, but mainly the GROUP_BY and HAVING COUNT (*) built-in functions. | ||
+ | |||
+ | < | ||
+ | |||
+ | SELECT base_sample.name, | ||
+ | FROM base_sample | ||
+ | JOIN base_site ON base_site.id = base_sample.site_id | ||
+ | WHERE base_site.what LIKE " | ||
+ | AND base_sample.site_id IN | ||
+ | (SELECT base_sample.site_id FROM base_sample GROUP BY base_sample.site_id HAVING COUNT(*) > 2) | ||
+ | |||
+ | </ | ||
+ | |||
+ | //note: the statement before and within the parentheses (I think) works more or less like this:\\ | ||
+ | 1. "AND base_sample.site_id IN" = give me all samples where their site_id field meets the following criteria.\\ | ||
+ | 2." | ||
+ | 3. " | ||
+ | |||
+ | ---- | ||
+ | |||
+ | \\ | ||
+ | |||
+ | 8. Query used to filter the ICE-D Alpine application for samples in the Cascades and Sierra Nevada as part of Tulenko et al. (2024): Cosmogenic 21Ne exposure ages on late Pleistocene moraines in Lassen Volcanic National Park, California, USA. | ||
+ | |||
+ | < | ||
+ | |||
+ | SELECT base_sample.name, | ||
+ | FROM base_sample | ||
+ | JOIN base_site ON base_sample.site_id = base_site.id | ||
+ | JOIN base_calculatedage ON base_calculatedage.sample_id = base_sample.id | ||
+ | WHERE base_sample.what LIKE " | ||
+ | AND base_site.what LIKE " | ||
+ | AND base_sample.site_id IN | ||
+ | (SELECT base_sample.site_id FROM base_sample GROUP BY base_sample.site_id HAVING COUNT(*) > 3) | ||
+ | AND (base_sample.lat_DD > 35 AND base_sample.lat_DD < 48) | ||
+ | AND (base_sample.lon_DD > -123 AND base_sample.lon_DD < -117) | ||
+ | AND (base_calculatedage.t_LSDn > 10000 AND base_calculatedage.t_LSDn < 35000) | ||
+ | ORDER BY base_sample.lat_DD | ||
+ | |||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | \\ |