help-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Checking GSL for Spectroscopy


From: Mike Marchywka
Subject: Re: Checking GSL for Spectroscopy
Date: Thu, 18 Mar 2021 19:21:07 -0400
User-agent: NeoMutt/20171215

On Thu, Mar 18, 2021 at 11:47:28AM -0400, Fritz Sonnichsen wrote:
>    Mark-Speed is an issue as follows"
>      Our device is an underwater microscope to which we are adding a raman 
> spectrometer.
>      Plankton, plastics or other objects pass thru a flow cell and pass a 
> camera and the spectrometer inputs.--all in a few
>    seconds
>      AI on the device quickly decides what it is "seeing" and records this.

So the identification is done with AI not Raman?

There does seem to be some literature on Raman for bio samples, for example, 

https://core.ac.uk/reader/76958504?utm_source=linkout


>      Apparently this must all be done "in situ" --e.g. no post processing on 
> the shore.
>      Meanwhile a spectrum is taken and this checks against 50,000 spectra, 
> combining them if necessary.

Raman spectra generally have meaning due to peak locations and relative 
intensities
and maybe widths. It should not be hard to reduce this to a comparison
of peak locations and these are even ordered. 

If you index peak positions, that gives you some idea where to look to explain
the peaks in your unknown sample although tolerance may be hard to accomodate.

>    So the camera passes the "shape" that it sees to an AI engine which has to 
> very quickly tell what plankton type it
>    sees-using opencv and various morphology and size. It is subtle--I am a 
> physicist but the biologist tells me that small
>    differences in structure mean a lot in determining species and state of 
> growth. (To me they all look like YUK!)
>      Ocean plastics are easier-the camera does not play into these--it is all 
> about the spectrum
>    Amazing thing is that most of this works but we want to "see" things every 
> 5 seconds, decide if an ROI exists--and then go
>    ahead with the speciation --say every minute.
>    This is not really a Finite Element problem--the spectra are linear--the 
> images are largely 2D.

Sure, but again the idea is a library - they hae all kinds of cool things like 
solving
huge sytems of equations or matrix operations etc . That is just one place
you use a big sytem. 

>         Yes--Dates should be very simple but at least Octave, Matlab, 
> gnuplots and others that I can't think of-all seem to
>    make a mess of simple dates in plots
>    Those old card punches did indeed have "TAB".  The O29 keypunch from IBM 
> had a way to set them up and knowing this could
>    rate a few dates. We also had card duplicators and sorters. I recall the 
> long 2 mile walk in 2 feet of snow from college,
>    carrying 2 foot-long  boxes of cards.Inevitably some would get wet. 
> (plastic grocery bags didn't exist yet!). To this day,
>    on occasion I will pull a 1970 book from my library and a card falls out. 
> They still make good bookmarkers! And yes--water
>    in real crystals today is every bit as much a problem to spectroscopy as 
> is wet cards representing a crystal !!!
>    SO-I suppose I am ranting off topic here--but in summary I think science 
> has run off the rails, computers have the tail
>    swinging the dog--and nobody is in charge to set it right again. As 
> Richard Feyman called it--many scientists have "got the
>    bug" and put away their math books to play endless games keeping up with 
> superfluous computer changes. Thankfully a few
>    products like GSL keep their C and their sanity.
>    cheers
>    fritz MA,USA
> 
>    On Thu, Mar 18, 2021 at 8:52 AM Mike Marchywka 
> <[mailto:marchywka@hotmail.com]marchywka@hotmail.com> wrote:
> 
>      On Thu, Mar 18, 2021 at 08:33:05AM -0400, Fritz Sonnichsen wrote:
>      >    Thanks Mike
>      >      as you mention the code that I am starting with (written by a 
> colleague) is more "model based" in that it
>      >    deliberately does not invoke the underpinning properties of Raman 
> optical spectra. This code is designed to run
>      really fast
>      >    on GPUs and minimizes any calculations. When I start working with 
> it I will figure out if some preprocessing to
>      categorize
>      >    the spectra for example would be of value.
>      Why is speed such an issue? Is this something to run on a distributed 
> system? You
>      could take a look at the libraries called by FEM code from national labs 
> like
>      libmesh. This is really complicated but something to look at as an 
> alternative-
>      I would stick with plain c code if you can :)
>      >       Your comment on CRAN R is of interest-i looked at it a few days 
> ago and it may be an approach--I think I will
>      know more
>      >    once I get the original code and see how sophisticated the 
> statistics usage is. But at least at this point I think
>      C with a
>      >    little GSL will do.
>      I actually wrote some of my own c++ code for graphing which sounds dumb 
> since R
>      should do that well but I had issues with dates and wanted to get svg 
> and latex
>      outputs although R supports both to some extent IIRC.
>      >      I am interested to find that Python is losing at least some 
> ground these days. I started on UNIVACs using
>      assembler and
>      >    FORTRAN on punch cards so I was always a little distressed by 
> adherence to column position issues necessary back
>      then. So
>      I'm not sure if cards had the concept of a "tab" lol that was the real 
> annoyance.
>      >    when I had  to write in Python a few years ago I actually had to 
> call a computer colleague to make sure I wasn't
>      >    missing something when the column requirements came up. Jeeesh. 
> More disturbing is the inability to write comments
>      at the
>      >    end of lines. I am a big supporter of in-line doc and I expect 
> this. Putting documents before the line takes a lot
>      of space
>      >    and more words--I have never rated code well on how many pages you 
> have to flip. But the real problems with Python
>      started
>      >    when I needed to do serial port programming--some concoction of 
> shared routines that ran differently  on ver 2 and
>      ver 3
>       I seem to recall that the Python module concept may bridge the 
> language- subroutine
>      idea.
>       I also like to "code for grep" and end of line comments are great
>      >    were needed and it was a disaster. A lot of other stories but I 
> think people here know them. It is sad to see a
>      large body
>      >    of scientists here thinking that somehow Python was necessary to 
> accommodate its rich set of routines. They have
>      never seen
>      >    another language and don't understand that those routines could 
> have been written in something more stable and
>      simpler.
>      >      Your story about writing a citation program hits home for me. I 
> worked on a system for formatting NOAA data that
>      one of
>      >    the institutions uses here. It requires a representative from each 
> department and tons of ill-formatted data must
>      be made
>      >    to comply with a generic format-all this done with unique code for 
> each case. And then the code never worked of
>      course. An
>      I have not even gotten to unstructured data- this is just trying to find 
> the structured
>      data although that may be next. Parsing formatted citations .... aarrrgh.
>      >    enormous cost of rare research funds that could have easily been 
> saved by simply making a few requirements on CSV
>      data. The
>      There is a simple line oriented key-value format called RIS and medline 
> uses
>      a similar or identical format. I prefer bibtex because it is easier to 
> read
>      , more versatile, and is my target format anyway .
>      >    computer tower of babble that has been built in recent years shows 
> to me at least that science has lost sight of
>      what it is
>      >    supposed to do and how it should get here. The tool is no longer a 
> tool-it is becoming the prime cost.
>      >       The GPL issues have been discussed here and I appreciated that 
> input. I have our business person looking at
>      this. The
>      >    Spectra library that we call must remain proprietary but the 
> operations on it are--to my mind-rather generic (I
>      have
>      >    written them more than one in C over the years) and we certainly 
> don't need to keep these close. And of course our
>      small
>      >    company is willing to pay reasonable amounts for a licence. The big 
> problem these days seems to be finding out what
>      you
>      Kepping code hidden means keeping bugs hidden too so I guess part of the 
> model
>      depends on the application and how users can verify the code is doing 
> something
>      useful. In medicine, tests are often ambiguous due to the issues with 
> biological
>      samples and ultimately many seem like "blackboxes" but it may be highly
>      desirable to avoid that with software. If you can keep it hidden and
>      useful/testable/verifiable/trustworthy that may be a big accomplisment 
> too.
>      >    need to do regarding licensing--and knowing it will stick once it 
> is done. Sometimes it is worse reading the US tax
>      >    code--which grew much Kudzu  as computer languages and 
> architectures do these days!
>      >    cheers
>      >    Fritz
>      >
>      >    On Thu, Mar 18, 2021 at 6:21 AM Mike Marchywka
>      
> <[mailto:[mailto:marchywka@hotmail.com]marchywka@hotmail.com][mailto:marchywka@hotmail.com]marchywka@hotmail.com>
>  wrote:
>      >
>      >      Thanks. I've looked briefly at a lot of different kinds of 
> "spectra" -
>      >      audio, solar, image fft, distributions,  xps, even Raman that may 
> evolve
>      >      with time -
>      >       and
>      >      as you suggest you may not be interested so much in some abstract
>      >      comparison as in extracting some model information. Comparing 
> spectra
>      >       may be with the intent of resolving a given one into component
>      >      pieces- how much of each basis element is  in the  measured thing.
>      >      Generally you have lines with some profile- gauss and lorentz 
> would
>      >      be well known - and then a continuum which could be anything
>      >      with blackbody and I guess fluorescence as examples.  Then you 
> have
>      >      instrument issues to resolve- baseline and maybe broadening could
>      >      be factors for a library.
>      >      You could imagine developing a language around common things-
>      >      consider maybe writing "R" packages that use GSL.  CRAN's R
>      >      may be a good open replacement for MATLAB.
>      >      I played with python briefly and any language that enforces white 
> space,
>      >      and IIRC earlier distinguished space and tab lol, is a bit
>      >      of a suspect ...
>      >      I've also run into various language-vs-library issues and 
> thinking about business
>      >      issues. I've got one "program" to make downloading citation 
> information
>      >      less distracting from diverse sources targeted at academics or 
> anyone doing internet
>      >      research ( this could be companies writing white papers or 
> technical reports for their own products
>      >      compared to competitors,  political hacks writing position or 
> policy papers if the
>      >      internet sites supply Bibtex for their works ). The code itself 
> is almost the opposite
>      >      of science- it is a collection of hacks tried in the order in 
> which I discovered
>      >      they may be useful to try to download citation information 
> without bothering
>      >      the user much. After looking at maybe 100's of hacks, some 
> patterns emerged
>      >      and in the conversion from an awful bash script to c/c++ it looked
>      >      like you could come up with a mini-language based on "subroutine"
>      >      or method calls.  The dev version uses readline for interaction 
> which appears
>      >      to have some licensing issues but since I almost always just 
> write for
>      >      myself I don't usually notice stuff like that.
>      >      btw, as their are likely academics here if you have your own 
> horror or success
>      >      stories getting citation information for your publication efforts 
> please share
>      >      as appropriate here or on the texhax list . Thanks.
>      >      note new address
>      >       Mike Marchywka 306 Charles Cox Drive Canton, GA 30115
>      >       2295 Collinworth  Drive Marietta GA 30062.  formerly 487 Salem 
> Woods Drive Marietta GA 30067 404-788-1216 (C)<-
>      leave
>      >      message 989-348-4796 (P)<- emergency
>      >      ________________________________________
>      >      From: Fritz Sonnichsen
>      
> <[mailto:[mailto:sonnichs@gmail.com]sonnichs@gmail.com][mailto:sonnichs@gmail.com]sonnichs@gmail.com>
>      >      Sent: Tuesday, March 16, 2021 9:53 AM
>      >      To: Mike Marchywka
>      >      Cc: 
> [mailto:[mailto:help-gsl@gnu.org]help-gsl@gnu.org][mailto:help-gsl@gnu.org]help-gsl@gnu.org
>      >      Subject: Re: Checking GSL for Spectroscopy
>      >      Mark
>      >        I am converting someone's MATLAB code so I am not sure what he 
> is doing yet--but several years ago I did
>      spectral
>      >      analysis in MATLAB and probably very similar. This is for Raman 
> and LIBS spectra.
>      >      1) "Usually" I apply a high pass filter to the spectrum. This 
> gets rid of the noise I need control over this
>      since as
>      >      you would expect the signal and noise can get pretty close! 
> Intuition comes into play here.
>      >      2) Next I baseline the spectra. This removes any constant bias.  
> For LIBS I was usually able to further filter
>      "spikes"
>      >      and then take a mean of the remaining line, subtracting this from 
> the overall spectrum. Raman can get a bit more
>      >      difficult-I am, at least,  subtracting the fluorescent line which 
> can have a lot of features (e.g. spikes). At
>      times, if
>      >      you know this background you can subtract it first but you get 
> all types of complications from normalization.
>      >      Again--intuition comes into play.
>      >      3) The resulting spectrum needs to be compared to a database. For 
> LIBS the latter is quite small--mostly
>      >      atomic/elemental data such as NIST. I could generally do a 
> discrete comparison of the spike locations using a
>      >      peak-finder, align them with the known examples and get a pretty 
> high hit rate. This was for qualitative data.
>      >      Raman is, again, much more complex. The data I was using was 
> constrained and simpler but the case in hand here is
>      much
>      >      more complex. We are doing mixed plastics at the moment. My 
> colleague found the best matches by taking a stats
>      >      correlation with 44000 entries and pulling out the values closest 
> to "one". It works remarkably well.
>      >      I don't think there is much above that cannot be written in C in 
> a reasonable amount of time. But we are looking
>      ahead
>      >      and would like to draw on the collective experience of the 
> science community. This type of analysis is quite
>      common and
>      >      there are enough new wheels out there that we don't want to 
> re-invent old ones!
>      >          Very important is that "intuition" part. I would think a lot 
> of this issue has been better solved since I was
>      doing
>      >      this. There are a lot of adjustments that could be made-for 
> example iterating trial baselines, rejecting noise at
>      varied
>      >      levels etc. Processors are faster now and the AI movement has 
> brought in PCA and a lot of other techniques that
>      begin to
>      >      transcend my current state of knowledge (I work more on the 
> physics end of things and would prefer to use
>      routines from
>      >      the communities if possible to save time).
>      >      Thanks for your interest Mark!
>      >      Fritz
>      >      On Tue, Mar 16, 2021 at 9:25 AM Mike Marchywka
>      >
>      
> <[mailto:[mailto:marchywka@hotmail.com]marchywka@hotmail.com][mailto:marchywka@hotmail.com]marchywka@hotmail.com<mailto:
>      
> [mailto:[mailto:marchywka@hotmail.com]marchywka@hotmail.com][mailto:marchywka@hotmail.com]marchywka@hotmail.com>>
>  wrote:
>      >      Can you comment on how you compare spectra? Just for my own
>      >      personal interest, not sure if will further the thread here 
> however..
>      >      Not sure a "dot product" in the conventional sense would help 
> much.
>      >      You could imagine comparing peak positions and relative heights
>      >      or a fit to a continuum for example.  Peaks plus black body in 
> some
>      >      vector comparison?
>      >      note new address
>      >       Mike Marchywka 306 Charles Cox Drive Canton, GA 30115
>      >       2295 Collinworth  Drive Marietta GA 30062.  formerly 487 Salem 
> Woods Drive Marietta GA 30067 404-788-1216 (C)<-
>      leave
>      >      message 989-348-4796 (P)<- emergency
>      >      ________________________________________
>      >      From: Help-gsl
>      >
>      
> <help-gsl-bounces+marchywka=[mailto:[mailto:hotmail.com@gnu.org]hotmail.com@gnu.org][mailto:hotmail.com@gnu.org]hotmail.
>      
> com@gnu.org<mailto:[mailto:[mailto:hotmail.com@gnu.org]hotmail.com@gnu.org]hotmail.c
>      >      [mailto:om@gnu.org]om@gnu.org>> on behalf of Fritz Sonnichsen
>      >
>      
> <[mailto:[mailto:sonnichs@gmail.com]sonnichs@gmail.com][mailto:sonnichs@gmail.com]sonnichs@gmail.com<mailto:[mailto:[mai
>      
> lto:sonnichs@gmail.com]sonnichs@gmail.com][mailto:sonnichs@gmail.com]sonnichs@gmail.com>>
>      >      Sent: Tuesday, March 16, 2021 9:15 AM
>      >      To:
>      
> [mailto:[mailto:help-gsl@gnu.org]help-gsl@gnu.org][mailto:help-gsl@gnu.org]help-gsl@gnu.org<mailto:[mailto:[mailto:help-
>      gsl@gnu.org]help-gsl@gnu.org][mailto:help-gsl@gnu.org]help-gsl@gnu.org>
>      >      Subject: Checking GSL for Spectroscopy
>      >      I am preparing to convert MATLAB code to something more general. 
> The new
>      >      code will run on LInux and ARM processors.
>      >         For a lot of reasons I am not going to use Python. We also 
> want to
>      >      keep this project "close" to scientists and do not want to turn 
> it into a
>      >      full time computer programming job. So the final word is that I 
> am looking
>      >      for something that can be called by (and hopefully is written) in 
> C. Worse
>      >      case I will just write the code myself but would prefer to start
>      >      integrating our systems into something with a lot of pre-written 
> and vetted
>      >      routines.
>      >      GSL looks like a good choice. Maybe R comes next. We have a mix 
> of needs
>      >      but I will point out a few:
>      >      1) Baselining a spectrum
>      >      2) Finding peaks in that spectrum
>      >      3) using Pearson correlation to compare the spectrum QUICKLY to
>      >      about 50,000 recorded examples.
>      >      We also have some uses with basic statistics and we do some image
>      >      processing.
>      >      So my question is--does GSL position itself in these areas? 
> MATLAB (with
>      >      packages) does them all.
>      >           I am not sure how active GSL, if it is keeping up with AI, 
> imaging and
>      >      spectroscopy--or is it fading or giving way to popular languages 
> for
>      >      example. I was surprised that the 600+ page manual did not seem 
> to show
>      >      anything relating to the simple spectral analysis described above 
> for
>      >      example. Certainly I can search the web for others' code but at 
> some point
>      >      if I cannot attach to a well established product I will just 
> write it
>      >      myself.
>      >      Any comments appreciated
>      >      thanks
>      >      Fritz
>      --
>      mike marchywka
>      306 charles cox
>      canton GA 30115
>      USA, Earth
>      [mailto:marchywka@hotmail.com]marchywka@hotmail.com
>      404-788-1216
>      ORCID: 0000-0001-9237-455X

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka@hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X



reply via email to

[Prev in Thread] Current Thread [Next in Thread]