lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Correctness and performance of varying column width mode in th


From: Vadim Zeitlin
Subject: Re: [lmi] Correctness and performance of varying column width mode in the census view
Date: Fri, 29 May 2020 01:52:20 +0200

On Thu, 28 May 2020 22:02:04 +0000 Greg Chicares <gchicares@sbcglobal.net> 
wrote:

GC> Okay, let's examine its behavior in detail. I'm using the very latest
GC> version of 'excel'; it's so postmodern that I can't figure out how to
GC> display what version it is, as there is no "Help | About".

 I keep forgetting that even something as basic as this could change from
version to version and should have thought to check it in the latest one.
FWIW I still have Excel 2010 (which, if you truncate it just a little bit,
becomes "Excel 20" which is clearly very current), and its behaviour is
indeed slightly different. I won't comment the differences in detail
because I don't think it materially changes anything in the conclusions
however.

GC> Move to the "123,456,789" cell above; copy it; move to a blank column;
GC> paste it. Result: "########", with column width unchanged. That may be
GC> the "standard" behavior, but I have always found it inconvenient:

 Yes, I've never understood how could possibly be useful neither. I guess
the idea is that this is supposed to be so clearly wrong that the user is
forced to resize the column manually.
 
GC> And that's all that happens for numbers, AFAICS. For lmi, duplicating
GC> the behaviors above would be just fine. It would also be fine to widen
GC> a column to fit a pasted single-cell value--in fact, I think that would
GC> be much better, even if 'excel' disagrees, because I find their behavior
GC> jarring. Maybe the lmi case is simpler in that only single-cell values
GC> are pastable within the grid (census paste from spreadsheet is an exotic
GC> special case). Even for census paste, I'd say it's best to adjust the
GC> column widths to fit all the data, provided that it's fast enough.

 I totally agree.

GC> Then we come to strings. Strings are different, because they are of
GC> potentially limitless length....
GC> 
GC> > wraps longer strings by making the cell containing them taller if
GC> > possible. I
GC> 
GC> That's yicky IMO.
GC> 
GC> > wonder if we should consider doing this too?
GC> 
GC> IIRC, lmi's census manager has absolutely uniform fixed row height, and
GC> I think that's appropriate.

 This is definitely much simpler and more efficient, so I'm glad to hear
that it's also appropriate.

GC> I've observed the goofy behavior you mention:
GC> 
GC>   make the
GC>   cell
GC>   taller so
GC>   that
GC>   everything
GC>   fits, like
GC>   columns in
GC>   a
GC>   newspaper
GC>   with poor
GC>   typography
GC> 
GC> in some unremembered circumstance, but it's wrong and horrid.

 I see it when I enter such text into a cell whose right neighbour is
non-empty, i.e. which can't overflow into the next cell.

GC> Ideally, I think we'd automatically widen columns as needed, to make all
GC> numbers fit as they're entered--whether by typing their digits, or pasting
GC> a scalar value,

 We definitely can do this quickly, i.e. in O(1) time.

GC> or even pasting a census...

 We can't do this without cheating in constant time, but we can cheat (see
below).

GC> and maybe even when the case or class defaults are modified and applied
GC> to all cells, as long as that doesn't introduce a painful delay.

 I'm not sure if we can find out how the displayed values are going to
change after such change. If we can do it, we could also do it in O(1) by
measuring the default values once and then just checking if they're greater
than the currently used widths. In the worst case, we'd do the same as for
pasting the census.

GC> Entering strings shouldn't cause the column width to change at all.

 I understand that truncating (actually, ellipsizing) strings is not as bad
as doing it for the numbers, but couldn't we still expand the columns width
for them just for consistency, both of the UI and the implementation? The
latter shouldn't dictate the former, of course, but it's simpler to do the
same thing for all cells, so if it's not actively harmful, why not do it
for the strings too?

GC> It's useful to have a resize-column-width-to-fit verb as a menu command.

 OK.

GC> > GC> >  Now there are other complications, e.g. the operation would still 
be O(N),
GC> > GC> > where N is the number of rows, if we wanted to also reduce the 
column
GC> > GC> > width, but IMO this is much less important and we could avoid doing 
this.
GC> 
GC> Most of the time, we're just changing one cell, so it's O(1):
GC>  - calculate w_new, the width needed for the new value
GC>  - compare to w_stored, the current width of the column
GC>  - if w_new <= w_stored, do nothing; else w_stored = w_new, and re-render

 Yes, exactly.

GC> Okay, then we want the excel behavior, i.e.:
GC>   automatically widen column to fit newly-entered data

 Somewhat unexpectedly, we seem to agree.

GC> I'm not sure what you mean by caching "best" widths. Above, I had
GC> convince myself that the wxGrid control knows the present width of
GC> each column. How does the "best" width differ from that? Does it
GC> differ only in that it may be narrower than the present width?

 Yes, best for me is the minimum width big enough to avoid truncation (or
ellipsization) of the column values.

GC> And if so, couldn't it so easily have become stale that we couldn't
GC> rely on it?

 Yes, it could become stale, but we'd be aware of it and recompute it if it
happens. It would be more complicated, but still doable if we really needed
it, but as we seem to agree that we don't, I won't pursue this further.


GC> >  More I think about this, more I become sure that we really should put 
some
GC> > time limit on auto-sizing wxGrid, whatever else we do, as it just 
shouldn't
GC> > be possible for it to take arbitrarily long...
GC> 
GC> Rhetorical question: what should it do if the time limit is exceeded?

 Same thing as wxDVC does: return the max width computed so far. The way
wxDVC auto-sizing code works is that it has some hard-coded timeout (20ms
currently) and computes the widths of as many items as possible, starting
from the top one, until this timeout expires. If there are still more rows
remaining, it also computes the widths of the same number of rows starting
from the bottom (which takes roughly 20ms more) and then it also computes
the widths of all the currently visible items (which, in practice, takes
significantly less than 20ms). So all in all it imposes the upper limit of
maybe ~50ms for the computation, which gives it time to compute the maximum
width of a few hundreds/couple of thousands top and bottom items as well as
all the visible ones, which seems to be good enough in practice -- at least
we've never received any complaints about this behaviour since implementing
it quite a few years ago, already.

GC> Here's what I think lmi does today, which I consider a misfeature:
GC>  - maintain a boolean global state: autofit, or not
GC>  - if "not", do nothing (so far, so good)
GC>  - if "autofit", then whenever the grid's contents change, perform
GC>      an all-encompassing O(#rows * #columns) resize (too costly),
GC>      for all future changes, until "autofit" is turned off

 Again, I think the problem is in the implementation and not in the feature
itself. IOW it's a QoI issue (could I fit 3 TLAs in a single sentence if I
tried harder?).

GC> Instead, I think we should adjust columns widths in this way:
GC>  - if one string cell changes, do nothing: O(0)
GC>  - if one numeric cell changes, widen its column if needed: O(1)
GC>    do that whether the new value is typed in or pasted in
GC>  - census pasting and applying cell or class changes seem to be
GC>    O(#rows * #columns), so that's the tough case that requires
GC>    more investigation and thought--a full resize would be nice,
GC>    but might be too costly

 I think it should be bearable if we implement the same trick in wxGrid as
in wxDVC (and we can do it relatively easily, especially because the logic
described above has been abstracted into its own class, independent of
wxDVC).

GC>  - when the user gives the autofit command: well, that could
GC>    just take a while, and the best we could do is to make it as
GC>    fast as possible

 There is also a question of whether we should still find the best
approximation of the fitting width even when the user selects this command
or if we should take as much time as needed to really find the ideal width.
I suspect you might prefer the latter, but personally I think the former
works well enough for any non-degenerate (i.e. specifically constructed to
make it fail) cases that it would be acceptable to do it even here.

GC> And then should we remove the "fixed width" menu command and
GC> toolbar button in lmi? Perhaps it still has value: if you've
GC> autosized all columns, and some string columns are more than
GC> half a screen wide, is it important to have a button that gets
GC> the grid back to a workable state with strings truncated?
GC> I'm not so sure it's needed: 'excel' doesn't seem to have it.

 I think this case is sufficiently rare that resizing the column manually
wouldn't be a huge chore.

GC> Okay, then I've changed my mind, as above:
GC>  - O(0) behavior for strings
GC>  - O(1) behavior for single-cell changes: if that's noticeably
GC>    slow, then we're doing something wrong
GC>  - O(#selected columns * #rows) behavior on explicit demand only
GC>  - O(#columns * #rows) behavior for census paste etc.: maybe,
GC>    but only if it's always really fast; otherwise, preserve widths
GC>    of old columns that persist, and use default width for any new
GC>    columns introduced

 We definitely can do this. I still have the question of O(0) for strings,
but if it's a conscious choice and not just an attempt to make things
simpler (because the actual effect is the opposite), we can do this too, of
course.

 BTW, we need to define what exactly do you mean by strings because there
are several string-like types: we can have just free form strings
(datum_string), we can have enum elements, or we can have sequences
(datum_sequence) that are also shown as strings. I suspect you really mean
only the first kind of strings, but please correct me if I'm wrong.


GC> > GC> > - Editing/adding/deleting cells will not do it, according to your 
answer,
GC> > GC> >   so there is no need to optimize doing it.
GC> > GC> 
GC> > GC> Agreed.
GC> > GC> 
GC> > GC> Unless, of course, you find a way to make it fast as lightning.
GC> > 
GC> >  For editing/adding, yes, we definitely can make it O(1). For deleting,
GC> > this is obviously not possible in general, i.e. if we want to make it 100%
GC> > precise (== "correct" in my initial message), but we could approximate it
GC> > or just not do anything in this particular case.
GC> 
GC> AFAICT, if you delete an 'excel' column, the widths of the remaining
GC> columns are unaffected. That sounds just right.

 Just to be clear, I meant deleting the rows. In this case finding the
perfectly correct new best width is O(N), as you could have deleted the row
containing the widest value for this column.

GC> > GC> > - But what about the operations affecting all cells (and so already
GC> > GC> >   taking O(N) time), such as editing class/case or pasting census
GC> > GC> >   from clipboard, should they still resize the columns to fit their
GC> > GC> >   contents?
GC> > GC> 
GC> > GC> No.
GC> > 
GC> >  OK, thanks.
GC> 
GC> As mentioned a few paragraphs above, I now think we should probably
GC> preserve the present widths of previous columns that remain after
GC> such a change,

 Yes, ideally we'd do this... Currently the code is so simple that it
doesn't track the remaining columns and just recreates all of them (this
was/is also the case for wxDVC).

GC> and default the widths of any new columns introduced by such an
GC> operation. Resizing all columns is likely to take too long.

 There is indeed a consideration of time-complexity in the number of
columns, that I mostly omitted so far, speaking about O(N) with N being the
number of rows. I suppose that there won't be as many columns as there can
be rows, but this could still be a problem and we should, of course, avoid
recalculating the widths of the columns which didn't change. The trouble is
that this requires more code, hence more complexity.

GC> For an initial version, it's perfectly okay to go with simple behavior.

 OK, so I'll implement all the things discussed above, but later, i.e.
after integrating the first version using wxGrid into lmi. The only
possible exception is the trick with limiting the time spent on grid
auto-sizing, as I believe it would be worth doing it right now to avoid
situations in which the program would appear to hang when Ctrl-] is pressed
when a big census is opened.

GC> >  Of course, please let me know if you disagree with anything here. TIA!
GC> 
GC> I guess I've disagreed with much of what I'd previously written, but
GC> such is the unity and interpenetration of opposites.

 And, similarly miraculously, combining thesis and antithesis results in
useful synthesis instead of uncontrolled release of destructive energy as
could be expected from the point of view of basely materialistic physics.

VZ

Attachment: pgpeWbfu9Uunr.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]