meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
pluginto:useful_sql_queries [2023/10/19 21:35] joetulenkopluginto:useful_sql_queries [2024/12/03 17:29] (current) joetulenko
Line 33: Line 33:
 ---- ----
  
-3. JOIN: In the event you want information from multiple tables (e.g., you want information from the base_sample and base_site and base_publication tables), you need to specify how these tables are linked together, in other words which fields in which tables are used to link tables together. In ICE-D, these fields (attributes) are all going to be ID fields. There are other ways to link tables, but brief documentation reading suggests this is the most computationally efficient path. The syntax is: JOIN //target table// ON //target table.identifyer// = //starting table.identifyer//+3. JOIN: In the event you want information from multiple tables (e.g., you want information from the base_sample and base_site and base_publication tables), you need to specify how these tables are linked together, in other words which fields in which tables are used to link tables together. In ICE-D, these fields (attributes) are all going to be ID fields. There are other ways to link tables, but brief documentation reading suggests this is the most computationally efficient path. The syntax is: JOIN //target table// ON //target table.identifier// = //starting table.identifier//
  
 example if I want information from example above, I need to connect to the base_site table and the base_publication table. This also means, since we have a two-way relation between samples and publications, there is a third table linking the two called base_samplepublicationsmatch: example if I want information from example above, I need to connect to the base_site table and the base_publication table. This also means, since we have a two-way relation between samples and publications, there is a third table linking the two called base_samplepublicationsmatch:
Line 87: Line 87:
  
 **1. Beware of replicate samples from queries!**\\ **1. Beware of replicate samples from queries!**\\
-Certain tables are linked to others through a two-way relation, or a 'many-to-many' realtion instead of the more typical one-way relation.+Certain tables are linked to others through a two-way relation, or a 'many-to-many' relation instead of the more typical one-way relation.
  
 For example, **samples** and **publications** are linked through a two-way relation because it is possible for a publication to have many samples (it better, right? It would hopefully be tough to publish a paper with only ONE exposure age reported in it), but it is also possible for one sample to be referenced in multiple publications (perhaps a research group ran a sample back in the day, but recently re-ran another aliquot of the same exact sample and discussed that sample in a new publication). For example, **samples** and **publications** are linked through a two-way relation because it is possible for a publication to have many samples (it better, right? It would hopefully be tough to publish a paper with only ONE exposure age reported in it), but it is also possible for one sample to be referenced in multiple publications (perhaps a research group ran a sample back in the day, but recently re-ran another aliquot of the same exact sample and discussed that sample in a new publication).
  
-Thus, IF YOU DO NOT WANT TO SELECT INFORMATION FROM A SAMPLE MORE THAN ONCE and it shows up in multiple publications, it is wise to use the DISTINCT statement after SELECT (see examples of DISTINCT in use below)+Thus, IF YOU DO NOT WANT TO SELECT INFORMATION FROM A SAMPLE MORE THAN ONCE and it shows up in multiple publications, I think it can be useful to use the DISTINCT statement after SELECT (see examples of DISTINCT in use below).
  
 \\ \\
Line 258: Line 258:
  
 If you want an EXACT match, omit the %'s inside the parentheses. Although this statement is still not case sensitive, just FYI.// If you want an EXACT match, omit the %'s inside the parentheses. Although this statement is still not case sensitive, just FYI.//
 +
 +----
 +
 +\\
 +
 +7. Filtering selection by choosing samples from sites that have AT LEAST 3 samples per site. For example, let's say I only trust moraine ages in the database that are based on at least 3 boulder ages on the moraine, any less and the age might be uncertain.\\
 +**New Feature:** lots of new features in this one, but mainly the GROUP_BY and HAVING COUNT (*) built-in functions.
 +
 +<code>
 +
 +SELECT base_sample.name, base_sample.lat_DD, base_sample.lon_DD
 +FROM base_sample
 +JOIN base_site ON base_site.id = base_sample.site_id
 +WHERE base_site.what LIKE "%moraine%"
 +AND base_sample.site_id IN
 +(SELECT base_sample.site_id FROM base_sample GROUP BY base_sample.site_id HAVING COUNT(*) > 2)
 +
 +</code>
 +
 +//note: the statement before and within the parentheses (I think) works more or less like this:\\
 +1. "AND base_sample.site_id IN" = give me all samples where their site_id field meets the following criteria.\\
 +2."SELECT base_sample.site_id FROM base_sample GROUP BY base_sample.site_id" = within the parentheses argument, select the site_id field from the samples table and group all entries in the samples table by their site_id. \\
 +3. "HAVING COUNT(*) > 2" = once samples are grouped by their site_id, find and select only samples where the number of samples per respective site is greater than 2 (ie at least 3).//
 +
 +----
 +
 +\\
 +
 +8. Query used to filter the ICE-D Alpine application for samples in the Cascades and Sierra Nevada as part of Tulenko et al. (2024): Cosmogenic 21Ne exposure ages on late Pleistocene moraines in Lassen Volcanic National Park, California, USA.
 +
 +<code>
 +
 +SELECT base_sample.name, base_site.short_name, base_calculatedage.t_LSDn, base_calculatedage.dtint_LSDn, base_sample.lat_DD, base_sample.lon_DD, base_calculatedage.nuclide
 +FROM base_sample
 +JOIN base_site ON base_sample.site_id = base_site.id
 +JOIN base_calculatedage ON base_calculatedage.sample_id = base_sample.id
 +WHERE base_sample.what LIKE "%oulder%"
 +AND base_site.what LIKE "%oraine%"
 +AND base_sample.site_id IN
 +(SELECT base_sample.site_id FROM base_sample GROUP BY base_sample.site_id HAVING COUNT(*) > 3)
 +AND (base_sample.lat_DD > 35 AND base_sample.lat_DD < 48)
 +AND (base_sample.lon_DD > -123 AND base_sample.lon_DD < -117)
 +AND (base_calculatedage.t_LSDn > 10000 AND base_calculatedage.t_LSDn < 35000)
 +ORDER BY base_sample.lat_DD
 +
 +</code>
  
 ---- ----
  
 \\ \\