[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [GNUnet-developers] useless crap??
From: |
Wayne Scott |
Subject: |
Re: [GNUnet-developers] useless crap?? |
Date: |
Mon, 29 Apr 2002 19:59:39 -0500 (EST) |
From: Christian Grothoff <address@hidden>
> Hmm. I haven't seen that happen (except for some issue with
> gnunet-insert-mp3). Which processes? Some time CPU eating by gnunetd on
> startup (generation of the hostkey) is ok.
It was the client disconnect problem in tcpserver.c. (read() returns
0) I had a solution myself before I saw that someone beat me to it.
>
> > Also at this point I assumed you would like people testing your new
> > code.
>
> Sure. Sorry if I was rude, I'm just a bit stressed at the moment (exam time).
> And that bug has been proven particularily nasty :-)
No problem.
>
> > I just have a problem that I have NEVER seen a response to a query
> > unless I inserted the data myself.
>
> Well, the network is still fairly small (usually I see 2-4 hosts online at a
> time, ls -l ~/.gnunet/data/hosts, look at the timestamps).
>
> > What keys should I search for that are likely to be found?
> > I get nothing from 'mp3'.
>
> mp3 is a bad keyword because it would (once the system works) return far too
> many results. Thus gnunet-insert-mp3 will NOT automatically generate that
> keyword.
Right. Too many results is what I wanted. :)
Actually mp3 would still be a useful keyword. I search for "Metalica
AND mp3" because I want music and not some other datatype.
But for testing, you need a standard test file that is very likely to
be on every node. Like the GPL example. Just encourage everyone to
fetch that file as a test everytime a node is installed. If that
happens then it is likely to work for new nodes and makes a good smoke
test to see if things are working.
> Also we need a better scheme for automatic keyword
> extraction from files because it is unrealistic that people will type
> in lots of keywords manually for each file they make available.
True. Your searching requires keywords because the only way data can
be found is if the person who published the data thought of the same
keyword that people use to search for it. Since to begin with most
content will just be copied from files obtained by other filesharing
networks, the obvious method is to just split the filename on word
boundries and add each word. In these networks that only allow
searching filenames, people have put keywords in the filenames.
I actually find it somewhat surprising that filename is NOT stored.
It could be encoded as part of the description. If I understand your
arch correctly, you can't do a partial string search on filenames, but
it would still save the user alot of work. I would like to be able to
just use the "standard" name for a file when extracting it.
(In fact, I might want to also save a .meta file for each thing
downloaded that contains the description and list of keywords, so that
the file can be uploaded again later idempotently. What happens if
two people publish the same file with different keywords and
descriptions?)
> There may also be performance issues (e.g. how does ext2 behave if
> data/content contains 1 million 1k files? Use reiserfs? database?);
The thing I see right now is that ~/.gnunet/data/content is a flat
directory. In most filesystems, directories are NOT indexed and you
have to do a linear scan any time you want to find a file. So you
should do like every one else and add a couple more directory levels.
(~/.gnunet/data/content/FE/4F/FE4F8155230050000000000065100000C79CA8BA)
This way the directory is not too big. I don't know the ideal number
of levels or number of bits at each level, but I KNOW a flat directory
will be really slow on ext2.
I use reiserfs so it isn't really a problem there. The filesize is
also problematic in the long run. Do you intend multiple nodes to
share this directory. If not, I would just impliment a large file
based hash.
> GNUnet is not really ready for the masses yet.
OK, I will try not to be the masses.
-Wayne