Re: [bug-gawk] Make awk data structure reside in memory

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Make awk data structure reside in memory

From:	Andrew J. Schorr
Subject:	Re: [bug-gawk] Make awk data structure reside in memory
Date:	Sat, 13 Apr 2019 00:05:40 -0400
User-agent:	Mutt/1.5.21 (2010-09-15)

Another option is to use the gawkextlib lmdb extension for a very fast
key-value store. You could put the lmdb database on tmpfs for faster
performance; you can see some benchmarks here:

http://www.lmdb.tech/bench/inmem/

"LMDB delivers the read performance of a pure-memory database, while still 
operating as a persistent data store."

Regards,
Andy

On Fri, Apr 12, 2019 at 05:52:54PM -0500, Peng Yu wrote:
> Why the shared memory thing (ipcs, ipcrm) can not be done? It sounds like
> the most relevant solution.
> 
> On Fri, Apr 12, 2019 at 12:51 PM david kerns <address@hidden>
> wrote:
> 
> > your other options include:
> > 1) creating a server that keeps running, thus the dataset in memory, and
> > service requests as they come
> > 2) using an existing awk extension (@load "rwarray") to quickly
> > save/restore an array (the dataset)
> > 3) write your own extension to solve it another way
> > 4) or choose a different language (C/Python) more suitable for the task ;)
> >
> > On Fri, Apr 12, 2019 at 8:35 AM Peng Yu <address@hidden> wrote:
> >
> >> No. This means that the same data will be loaded multiple times, which is
> >> not acceptable as the data is too large to fit multiple of them in  memory.
> >> A solution must load the data once in memory.
> >>
> >> On Fri, Apr 12, 2019 at 10:08 AM david kerns <address@hidden>
> >> wrote:
> >>
> >>> If you're running on Linux, (and you have the memory available) you
> >>> might consider using /dev/shm (or another RAM disk)
> >>> awk 'BEGIN{f="/dev/shm/fred";for(i=0;i<100000;i++)print i, rand() >
> >>> f;close(f)}' # store your DB/dataset
> >>> time awk 'BEGIN{f="/dev/shm/fred";while ((getline
> >>> l<f)>0){split(l,a);db[a[1]]=a[2]};close(f)} /* other processing here */'
> >>> re-reading from RAM disk should be significantly faster than normal disk
> >>> (assuming you have the RAM and don't start swapping)
> >>>
> >>> On Fri, Apr 12, 2019 at 7:20 AM Peng Yu <address@hidden> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I need to load a large dataset into memory before the process of some
> >>>> other small data can be done. This will be inefficient if I need to
> >>>> load the data again and again whenever I need to process different
> >>>> small data. Is there a way to make the large data in memory so that
> >>>> different awk processes can read them without having to reload them
> >>>> into the memory? Thanks.
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Peng
> >>>>
> >>>> --
> >> Regards,
> >> Peng
> >>
> > --
> Regards,
> Peng

-- 
Andrew Schorr                      e-mail: address@hidden
Telemetry Investments, L.L.C.      phone:  917-305-1748
545 Fifth Ave, Suite 1108          fax:    212-425-5550
New York, NY 10017-3630

[Prev in Thread]

Current Thread

[Next in Thread]

[bug-gawk] Make awk data structure reside in memory, Peng Yu, 2019/04/12
- Re: [bug-gawk] Make awk data structure reside in memory, david kerns, 2019/04/12
  - Message not available
    - Message not available
    - Re: [bug-gawk] Make awk data structure reside in memory, Peng Yu, 2019/04/12
    - Re: [bug-gawk] Make awk data structure reside in memory, Andrew J. Schorr <=

Prev by Date: Re: [bug-gawk] Make awk data structure reside in memory
Next by Date: Re: [bug-gawk] Debugging Script
Previous by thread: Re: [bug-gawk] Make awk data structure reside in memory
Next by thread: [bug-gawk] When RS is null, POSIX states \n should be in FS, gawk only does that if FS is single char
Index(es):
- Date
- Thread