[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing a
From: |
Tim Churches |
Subject: |
Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr |
Date: |
Tue, 25 Apr 2006 13:23:50 +1000 |
User-agent: |
Thunderbird 1.5.0.2 (Windows/20060308) |
Syan Tan wrote:
> i've processed 360,000 rows of clin.clin_narrative and parsed out all the
> words
>
> containing letters. I was thinking of using a stoplist method where any word
> appearing
>
> on the stoplist will be replaced by 'xxxx' . The stoplist would also include
> all
> the names
>
> listed out from dem.names.lastnames and dem.names.firstnames.
>
> BTW - what about a secondary structure for clin.clin_narrative, where the
> narrative
>
> consists of a list of indexes pointing into a table of words. this is the
> simplest step before
>
> having some sort of semantic linking at the word level ( but not at the
> phrase
> level).
>
> whilst trying to recreate the gnumed database using a pg_dump,
>
> the dump reload seems to stall ; I tried to turn off logging, table
> constraints, removing
>
> internal log table data , and fsync , which all finally worked , but I'm not
> sure what causes the stall.
>
>
>
>
> *On Mon Apr 24 18:53 , Karsten Hilbert sent:
>
> *
>
> On Thu, Apr 20, 2006 at 09:47:54AM +0800, Syan Tan wrote:
>
> > thinking about it, the only correct thing to do seems to be to
> preserve the
> > structure of the instance data and the health issue + episode headings,
> but to
> > scramble the text with word substitution, as well as name
> substitution, date
> > fudging, and address random relinking . would that be de-identified
> enough ?
> Well, I tend to think that "de-identified enough" is a range
> from "acceptably so" to "beyond use" rather than a cutoff.
> The exact value used within that range depends on what sort
> of protection you need.
>
> Yes, if you want to hide a patient's data securely from your
> fellow doctor next door you will have to scamble the medical
> content, too, as she might be able to match "real patient"
> to "problems/operations listed" by her own medical skills
> and thereby gain knowledge via the now re-identified EMR.
>
> But if you want to protect a patient's privacy from, say,
> me, it's enough to falsify the identities. I do not have
> access to your patients. I also have no idea how to find out
> who your patients actually are in order to start matching
> EMRs to patients. Hence proper protection is ensure, I dare
> say. It is akin to not storing patient names with any
> medical data and hold the EMR ID <-> patient identity
> mapping elsewhere in a secure space (say, the patient's
> brain).
>
> In a recent discussion on the openhealth list this topic was
> chanced upon and the OpenEHR guys thought the latter
> approach would be the most secure that's practically useful
> - and they were talking real live patient data in actual
> care.
I didn't mention it on the openEHR list (maybe I should) but merely
removing the direct identifiers (names, DOB etc) does not de-identify or
anonymise that data. For example, if the record reveals "32 yr old male,
with medical visits on 23/4/04, 12/6/05 and 14/01/06" then that record
has a very high probability of being unique to an individual in even a
large population. Hence if I know your age and sex (easily discovered or
ascertained) and I know that you had medical appointments on those dates
(eg if I had access to your work leave records, as staff in the
personnel department of your employer may have), then I can fairly
easily which record belongs to you. Disclosure control in microdata
almost always involves some degree of obfuscation, perturbation or
allocation to broad categories - in other words, a lot of detail needs
to be removed to make real data truly anonymous (in that it cannot be
re-identified). Also, anonymity of data is a continuum - it is not
dichotomous, and often it comes down to a risk judgement and some
assumptions about what additional information an 'attacker' who might
try to re-identify records might possess. If the data are to be made
publicly available, you can't make any assumptions about what an
attacker might or might not already know about a person, so you need to
be very conservative.
Tim C
- Fwd: Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr, Syan Tan, 2006/04/01
- Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr, Syan Tan, 2006/04/19
- Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr, Syan Tan, 2006/04/24
- Re: [Gnumed-devel] experiments with gnumed - multiusers vnc, importing an au emr,
Tim Churches <=