gnuastro-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[task #15803] Match program builds k-d tree and later read from it


From: Mohammad Akhlaghi
Subject: [task #15803] Match program builds k-d tree and later read from it
Date: Sat, 17 Apr 2021 17:20:37 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0

Follow-up Comment #8, task #15803 (project gnuastro):

Thanks a lot Sachin, I just had a fast look. Here are some points:

* We won't be rebasing the commits into one commit for the final merge any
more. We'll just rebase them all over the 'master' branch. So it is important
that the history be well preserved and to have clear commit message. In this
spirit I made some corrections in your commit message (Commit d9ee53845357b
<https://gitlab.com/makhlaghi/gnuastro-dev/-/commit/d9ee53845357b>): 1) the
title should start with the relevant component of Gnuastro (in this case
"Library (match.h)"). 2) The first sentence of the body should start with
"Until now, ..." and fully describe *the problem* that motivated this commit
(before jumping into the solution). The solution should start in a separate
paragraph starting with "With this commit ...". 3) Highlight codes with single
quotes ('), not reverse quates (`). In Markdown, reverse quotes are indeed
used to highlight code, but the commit messages aren't interpreted as markdown
and a single quote is easier to read with no markdown. You can see the
modified commit message in the P.S. 

* I noticed that the lines in 'during-dev-test-data/match-query.txt' ended
with space characters. Git highlights them with red in a simple 'git diff' (or
'git log -p' after the commit). So before committing/pushing, be sure to run
either of these two and make sure your commit never has such lines ;-).

After correcting the commit, I built Gnuastro over it and noticed that 'make
check' failed because the test script wasn't executable, so I made the tiny
Commit 44bedb4a <https://gitlab.com/makhlaghi/gnuastro-dev/-/commit/44bedb4a>
over it. Besides making it executable, I also removed the '-match' part in the
name: because the test is already prefixed with 'match/' and having the
'match' string repeated two times is redundant ;-).

About the tests, instead of random trails (which can also be hard to reproduce
later), it may be easier to just use real data ;-). With this command you can
download all Gaia stars <https://sci.esa.int/web/gaia> around the celestial
coordinates (0,0) within a radius of 1 degrees (and save them in 'all.fits'):


astquery vizier --dataset=gaiaedr3 --center=0,0 --radius=1
-cEDR3Name,RAJ2000,DEJ2000,Gmag -oall.fits


It results in 10056 stars. Now, you can select a random sub-set of the rows
(for matching) using the '--rowrandom' option. For example with this command,
you can extract 1000 rows of it randomly:


asttable all.fits --rowrandom=1000 -oall-sub.fits


You can now match this sub-set with the full table :-). Since its a sub-set
you know a-priori the total number of successful matches (1000 in the case
above), and things are easier to test.

To make the subset selection reproducible, see the Generating Random Numbers
<https://www.gnu.org/software/gnuastro/manual/html_node/Generating-random-numbers.html>
section of the book. Here is what you should do:


export GSL_RNG_TYPE=ranlxs1
export GSL_RNG_SEED=1619042801
asttable all.fits --envseed --rowrandom=1000


If you later want more stars (for more serious tests), you can simply increase
the value to '--radius' in the 'astquery' command above :-).

The good thing with this process is that we won't need to keep data in the
commits and at the very start of the benchmarking script, we can simply
download the necessary dataset.

Finally, we will probably be using this benchmarking a lot in the future to
improve/debug the '--kdtree=automatic' algorithm. So its good to write it in
Bash and move it into the Match program directory. On the command-line, you
can use the 'time' command before any command to get the resource usage of the
command.

Note that for now, the benchmarking is only to see when Gnuastro's own
sort-based algorithm should be switched to the k-d tree based algorithm ;-).
For comparing with other k-d tree implementations, we'll need better control
over the environment (using Maneage <http://maneage.org> (by the way:
Maneage's paper has been accepted into IEEE's CiSE
<https://doi.org/10.1109/MCSE.2021.3072860>)

P.S. Modified commit message (Commit d9ee53845357b
<https://gitlab.com/makhlaghi/gnuastro-dev/-/commit/d9ee53845357b>)


Library (match.h): match_coordinate_ replaced by match_sort_based_
    
Until now, there was only a single match algorithm in Gnuastro, so the name
of the respective functions in the library had a 'match_coordinate_'
prefix. However, in this branch we are adding a new k-d tree based
matching, so that name for the initial algorithm could cause confusion
(because k-d tree also uses coordinates!).
    
With this commit, those same function names are now prefixed with
'match_sort_based_'. This change is done in all the relevant places:
bin/match/match.c, lib/gnuastro/match.h, lib/match.c and in
doc/gnuastro.texi.
    
Furthermore, a 'make check' test has been added for the k-dtree based
matching which does simple matching based on the predefined inputs used for
the old matching algorithm.
    
Finally, some scripts have been added in the temporary directory for
benchmarking tests to see the efficiency of k-d tree over sort-based
matching. Two scripts are written in '/during-dev-test-data/scripts'
(namely 'kdtree-gen.py', which generates the pseudo random output tests,
and 'benchmark.py') which will calculate the run time for the scripts and
generate a graph for number of times the program is executed vs. time
taken. The script for benchmarking is simple and might take long time for
>10 executions. A possible efficient method is multithreading, which will
be implemented soon. After this we can implement the --kdtree=automatic
option based on the result.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/task/?15803>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]