discuss@lists.openscad.org

OpenSCAD general discussion Mailing-list

View all threads

search function is broken

AM
Adrian Mariano
Tue, Feb 22, 2022 11:39 AM

Yeah, I suspected that the reason for handling lists might have been the
non-existence of list comprehensions.  But that no longer applies.

I would approve of modifying search() to add "atomic search".  I don't
really care which syntax is chosen to make it happen.  I proposed find()
with a function predicate as a way of showing that a simple definition
could be constructed that would be just as powerful as search(), where all
the cases that really never actually arise can be handled by choosing a
suitable function, so you can do everything if you really need to, and in
fact you can do some other useful and interesting things like approximate
search.

In answer to Rob, the reason it's important to not break code is that it
means you have taken someone's working code and made it not work any more.
That's annoying.  My code worked yesterday.  What's going on?  And how do I
fix it?  Why should I care that search() has a stupid syntax if I've got
thousands of lines of code that use it that I've written between 2012 and
today that I'm just using and not re-coding all the time?

On Tue, Feb 22, 2022 at 1:22 AM MichaelAtOz oz.at.michael@gmail.com wrote:

I actually think there's no value in being able to search for multiple

things, and just extra confusion.

Perhaps, other may think differently, including the original author.

In 2015 clothbot, the author said:

"A few comments/history behind my original writing of search():

When I wrote it (in 2012):

  1. The ‘undef’ didn’t exist as a return option so I settled on

returning empty lists which could be detected (list of length 0) as ‘no
match’ conditions - it predated the Value rewrite of the code-base.

  2. The ‘concat’ list construction operator didn’t exist; I needed a

way to search for a string-of-characters (aka. an ordered list of character
values) and get the results back in order, as a list.

  3. The ‘let’ operator didn’t exist.

  4. Lists were statically defined; [ for() … ] dynamically generated

lists weren’t possible.

  5. Function recursion was (and still is to some degree) dog-slow for

more ‘elegant’ list construction approaches.

  6. The text() module didn’t exist.

        - See example023.scad combined with MCAD/fonts.scad for insight

into how I was generating text, and the original motivation behind coding
up search().

  7. The no-match warnings are gone as of last week; you’ll have to

build from source to see that.

It’s not buggy, just written within the constraints of the time. ;-)"

search() is a legacy of its time.

However, changing search() to not match on multiples/sub-vectors, is IMHO
too much of a breakage. There is 10 odd years of search() usage in the
wild.

My suggestion provides a means to do atomic search, without breaking
'normal' past usage.

It does not make search simpler, that requires something new, but it give
you built-in speed, likely sooner.

I don't particularly need this, so I'm not wasting my breath any further.

-----Original Message-----

From: Adrian Mariano [mailto:avm4@cornell.edu]

Sent: Tue, 22 Feb 2022 11:32

To: OpenSCAD general discussion

Subject: [OpenSCAD] Re: search function is broken

Is it consistent to treat strings as atomic data types?  Right now, if

you want to search for 5 you need to write search([5],...) and if you

want to search for "hello" you do search (["hello"],...) so the

behavior is consistent.

I actually think there's no value in being able to search for multiple

things, and just extra confusion.  There are two reasons for this.

One is that in practice it just never seems to be needed.  In BOSL2 we

never use search() to search for more than one item.  The second

reason is that there is no computational advantage to providing this

feature.  If you really need to search for multiple items you can

use a list comprehension, and the run time is identical.  Instead we

always write search([item],....)[0].  Of course, there's the other

issue that we often might be interested in searching for approximate

matches (within epsilon), which points towards a find function that

looks for a function to be satisfied instead of just simple equality.

I don't know if this ends up being a big enough efficiency gain to

make it worthwhile.  But if we had a find that took a predicate and

returned the entries on which the predicate is true (like was

suggested a ways up) and it ran as fast as search() does right now

with a default predicate of simple equality testing, that seems like

it would be pretty useful, and if you really want to look at the 17th

position in a vector your predicate can do it, so it provides all the

power and more compared to search() without any interface complexity.

Perhaps it could be:  find(search_target, list, num_matches)

where if search_target is a function, it returns all i such that

search_target(list[i]) is true, and otherwise it returns all i such

that list[i]==search_target.

Of course, this is only worthwhile if it's fast the way the current

search() is.  if list can also be a range then you have the

possibility of testing a predicate on a list of values without having

to construct the list first.  Not sure if that gives a speed advantage

or not.

On Mon, Feb 21, 2022 at 7:16 PM Jordan Brown

On 2/21/2022 4:11 PM, MichaelAtOz wrote:

Remember this was proposed as a readily doable fix in the near term,

v's the time it

would take to gestate a new find().

search() is difficult to understand at the best of times... gluing on

an additional bit

of complexity will make it incrementally worse.

It seems like it should be possible to come up with a find() that's

more

understandable and as powerful, or at least nearly as powerful.  (It

would start with

treating strings as atomic data items, not as arrays of characters, at

least by default.)

But I admit that I haven't tried to write down what the specification

for such a

function would be.


OpenSCAD mailing list

To unsubscribe send an email to discuss-leave@lists.openscad.org


OpenSCAD mailing list

To unsubscribe send an email to discuss-leave@lists.openscad.org

Yeah, I suspected that the reason for handling lists might have been the non-existence of list comprehensions. But that no longer applies. I would approve of modifying search() to add "atomic search". I don't really care which syntax is chosen to make it happen. I proposed find() with a function predicate as a way of showing that a *simple* definition could be constructed that would be just as powerful as search(), where all the cases that really never actually arise can be handled by choosing a suitable function, so you can do everything if you really need to, and in fact you can do some other useful and interesting things like approximate search. In answer to Rob, the reason it's important to not break code is that it means you have taken someone's working code and made it not work any more. That's annoying. My code worked yesterday. What's going on? And how do I fix it? Why should I care that search() has a stupid syntax if I've got thousands of lines of code that use it that I've written between 2012 and today that I'm just using and not re-coding all the time? On Tue, Feb 22, 2022 at 1:22 AM MichaelAtOz <oz.at.michael@gmail.com> wrote: > > I actually think there's no value in being able to search for multiple > > things, and just extra confusion. > > > > Perhaps, other may think differently, including the original author. > > > > In 2015 clothbot, the author said: > > "A few comments/history behind my original writing of search(): > > When I wrote it (in 2012): > > 1. The ‘undef’ didn’t exist as a return option so I settled on > returning empty lists which could be detected (list of length 0) as ‘no > match’ conditions - it predated the Value rewrite of the code-base. > > 2. The ‘concat’ list construction operator didn’t exist; I needed a > way to search for a string-of-characters (aka. an ordered list of character > values) and get the results back in order, as a list. > > 3. The ‘let’ operator didn’t exist. > > 4. Lists were statically defined; [ for() … ] dynamically generated > lists weren’t possible. > > 5. Function recursion was (and still is to some degree) dog-slow for > more ‘elegant’ list construction approaches. > > 6. The text() module didn’t exist. > > - See example023.scad combined with MCAD/fonts.scad for insight > into how I was generating text, and the original motivation behind coding > up search(). > > 7. The no-match warnings are gone as of last week; you’ll have to > build from source to see that. > > It’s not buggy, just written within the constraints of the time. ;-)" > > > > search() is a legacy of its time. > > > > However, changing search() to not match on multiples/sub-vectors, is IMHO > too much of a breakage. There is 10 odd years of search() usage in the > wild. > > > > My suggestion provides a means to do atomic search, without breaking > 'normal' past usage. > > It does not make search simpler, that requires something new, but it give > you built-in speed, likely sooner. > > > > I don't particularly need this, so I'm not wasting my breath any further. > > > > > > > -----Original Message----- > > > From: Adrian Mariano [mailto:avm4@cornell.edu] > > > Sent: Tue, 22 Feb 2022 11:32 > > > To: OpenSCAD general discussion > > > Subject: [OpenSCAD] Re: search function is broken > > > > > > Is it consistent to treat strings as atomic data types? Right now, if > > > you want to search for 5 you need to write search([5],...) and if you > > > want to search for "hello" you do search (["hello"],...) so the > > > behavior is consistent. > > > > > > I actually think there's no value in being able to search for multiple > > > things, and just extra confusion. There are two reasons for this. > > > One is that in practice it just never seems to be needed. In BOSL2 we > > > never use search() to search for more than one item. The second > > > reason is that there is no computational advantage to providing this > > > feature. If you *really* need to search for multiple items you can > > > use a list comprehension, and the run time is identical. Instead we > > > always write search([item],....)[0]. Of course, there's the other > > > issue that we often might be interested in searching for approximate > > > matches (within epsilon), which points towards a find function that > > > looks for a function to be satisfied instead of just simple equality. > > > I don't know if this ends up being a big enough efficiency gain to > > > make it worthwhile. But if we had a find that took a predicate and > > > returned the entries on which the predicate is true (like was > > > suggested a ways up) and it ran as fast as search() does right now > > > with a default predicate of simple equality testing, that seems like > > > it would be pretty useful, and if you really want to look at the 17th > > > position in a vector your predicate can do it, so it provides all the > > > power and more compared to search() without any interface complexity. > > > > > > Perhaps it could be: find(search_target, list, num_matches) > > > > > > where if search_target is a function, it returns all i such that > > > search_target(list[i]) is true, and otherwise it returns all i such > > > that list[i]==search_target. > > > > > > Of course, this is only worthwhile if it's fast the way the current > > > search() is. if list can also be a range then you have the > > > possibility of testing a predicate on a list of values without having > > > to construct the list first. Not sure if that gives a speed advantage > > > or not. > > > > > > > > > > > > > > > On Mon, Feb 21, 2022 at 7:16 PM Jordan Brown > > > <openscad@jordan.maileater.net> wrote: > > > > > > > > On 2/21/2022 4:11 PM, MichaelAtOz wrote: > > > > > > > > Remember this was proposed as a readily doable fix in the near term, > v's the time it > > > would take to gestate a new find(). > > > > > > > > > > > > search() is difficult to understand at the best of times... gluing on > an additional bit > > > of complexity will make it incrementally worse. > > > > > > > > It *seems* like it should be possible to come up with a find() that's > more > > > understandable and as powerful, or at least nearly as powerful. (It > would start with > > > treating strings as atomic data items, not as arrays of characters, at > least by default.) > > > > > > > > But I admit that I haven't tried to write down what the specification > for such a > > > function would be. > > > > > > > > _______________________________________________ > > > > OpenSCAD mailing list > > > > To unsubscribe send an email to discuss-leave@lists.openscad.org > > > _______________________________________________ > > > OpenSCAD mailing list > > > To unsubscribe send an email to discuss-leave@lists.openscad.org > > > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> Virus-free. > www.avg.com > <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> > <#m_-2987481677903847091_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > _______________________________________________ > OpenSCAD mailing list > To unsubscribe send an email to discuss-leave@lists.openscad.org >
JB
Jordan Brown
Tue, Feb 22, 2022 5:36 PM

On 2/21/2022 11:27 PM, Rob Ward wrote:

Why is important to not break code that uses clumsy syntax in the
first place? Why not just get it right?

Because it really annoys people when their program that's been working
for years stops working - especially if fixing it makes it incompatible
with older versions of the platform.

Sometimes you just need to bite it and do that, but it's something to be
avoided.  I wouldn't do it for a function, where there's plenty of
namespace and you can just create a new function and relegate the old
one to a "legacy" pile.

On 2/21/2022 11:27 PM, Rob Ward wrote: > Why is important to not break code that uses clumsy syntax in the > first place? Why not just get it right? Because it really annoys people when their program that's been working for years stops working - especially if fixing it makes it incompatible with older versions of the platform. Sometimes you just need to bite it and do that, but it's something to be avoided.  I wouldn't do it for a function, where there's plenty of namespace and you can just create a new function and relegate the old one to a "legacy" pile.
RW
Rogier Wolff
Fri, Feb 25, 2022 9:35 AM

On Tue, Feb 22, 2022 at 12:16:01AM +0000, Jordan Brown wrote:

On 2/21/2022 4:11 PM, MichaelAtOz wrote:

Remember this was proposed as a readily doable fix in the near term,
v's the time it would take to gestate a new find().

search() is difficult to understand at the best of times... gluing on an
additional bit of complexity will make it incrementally worse.

It seems like it should be possible to come up with a find() that's
more understandable and as powerful, or at least nearly as powerful. 
(It would start with treating strings as atomic data items, not as
arrays of characters, at least by default.)

Hmm. I think this would not be the best idea.

I would suggest that a string is an array(vector) of characters.

Now... find would need an argument of "what to find" and an
array of those to find it in.

so if we define:
find (<needle>, <haystack>);

then
find ('s', "haystack");
would find it in the 4th position at index 3.

and
find ("pear", ["apple", "pear", "banana"]);

would find it in the second spot index 1.

I was hoping it would also automatically be able to find substrings,
but writing this convinced me that's something different.

Roger. 

P.S. My mail is down. You'll get this probably (long) after I've
written/sent this.

But I admit that I haven't tried to write down what the specification
for such a function would be.


OpenSCAD mailing list
To unsubscribe send an email to discuss-leave@lists.openscad.org

--
** R.E.Wolff@BitWizard.nl ** https://www.BitWizard.nl/ ** +31-15-2049110 **
**    Delftechpark 11 2628 XJ  Delft, The Netherlands.  KVK: 27239233    **
f equals m times a. When your f is steady, and your m is going down
your a is going up.  -- Chris Hadfield about flying up the space shuttle.

On Tue, Feb 22, 2022 at 12:16:01AM +0000, Jordan Brown wrote: > On 2/21/2022 4:11 PM, MichaelAtOz wrote: > > Remember this was proposed as a readily doable fix in the near term, > > v's the time it would take to gestate a new find(). > > search() is difficult to understand at the best of times... gluing on an > additional bit of complexity will make it incrementally worse. > > It *seems* like it should be possible to come up with a find() that's > more understandable and as powerful, or at least nearly as powerful.  > (It would start with treating strings as atomic data items, not as > arrays of characters, at least by default.) Hmm. I think this would not be the best idea. I would suggest that a string is an array(vector) of characters. Now... find would need an argument of "what to find" and an array of those to find it in. so if we define: find (<needle>, <haystack>); then find ('s', "haystack"); would find it in the 4th position at index 3. and find ("pear", ["apple", "pear", "banana"]); would find it in the second spot index 1. I was hoping it would also automatically be able to find substrings, but writing this convinced me that's something different. Roger. P.S. My mail is down. You'll get this probably (long) after I've written/sent this. > > But I admit that I haven't tried to write down what the specification > for such a function would be. > _______________________________________________ > OpenSCAD mailing list > To unsubscribe send an email to discuss-leave@lists.openscad.org -- ** R.E.Wolff@BitWizard.nl ** https://www.BitWizard.nl/ ** +31-15-2049110 ** ** Delftechpark 11 2628 XJ Delft, The Netherlands. KVK: 27239233 ** f equals m times a. When your f is steady, and your m is going down your a is going up. -- Chris Hadfield about flying up the space shuttle.
JB
Jordan Brown
Fri, Feb 25, 2022 5:26 PM

On 2/22/2022 4:38 AM, Rogier Wolff wrote:

On Tue, Feb 22, 2022 at 12:16:01AM +0000, Jordan Brown wrote:

It seems like it should be possible to come up with a find() that's
more understandable and as powerful, or at least nearly as powerful. 
(It would start with treating strings as atomic data items, not as
arrays of characters, at least by default.)

Hmm. I think this would not be the best idea.

I would suggest that a string is an array(vector) of characters.

Now... find would need an argument of "what to find" and an
array of those to find it in.

so if we define:
find (<needle>, <haystack>);

then
find ('s', "haystack");
would find it in the 4th position at index 3.

and
find ("pear", ["apple", "pear", "banana"]);

would find it in the second spot index 1.

I was hoping it would also automatically be able to find substrings,
but writing this convinced me that's something different.

Your thinking parallels a discussion that Adrian and I have been having.

Our current thoughts on the "find" function are roughly:

  • find(needle, haystack) returns a single integer, or undef if not found.
  • find(needle, haystack, some-option) returns multiple matches as a list.
    o Not clear on what some-option is.  Obvious is something like
    "limit=n", perhaps with n=0 meaning "all", but n=0 being "all"
    is a bit wrong and it's not clear that there are real use cases
    for anything other than "first" and "all".  Maybe "all=boolean",
    or maybe find() versus findall().
  • needle can be any OpenSCAD value for which equality is defined, or a
    function.
  • If needle is a function, it's called for each entry in the haystack,
    and it should return true if the entry matches the desired criteria.
    o Obscure case:  this disallows searching for a function.  If
    you really want to do that, wrap the function and each of the
    elements of the haystack in lists, so that you are searching for
    a list.  Alternatively, use a function to compare with the
    desired function.
  • No explicit provision for searching for "any of these".  The use
    case seems limited, and having a list for "needle" is ambiguous with
    searching for a list - especially given the equivalence between
    strings and lists.  Note that you can use a function to implement
    "search for any of these".
  • No explicit provision for "search by column".  You can use a
    function.  Also note that objects, when support is added for user
    programs to create them, will address a number of the use cases there.
  • find(char, string) will work as you suggest, because of string-list
    equivalence, but is not considered a primary goal.
  • substring searches are out of scope.  (But if there was a separate
    substring search function, you could use it in a needle function.)

(Adrian, did I miss any points?)

The case where the needle is a function is very powerful, and can be
used to address a lot of less-common cases.  However, it's not clear how
fast it is - and it looks slow.  Using list comprehension and "for ...
if ..." as a proxy for "find all", it appears that using a function to
do a simple comparison may be over a hundred times slower than a simple
search(), and only slightly faster than a recursive solution.

On 2/22/2022 4:38 AM, Rogier Wolff wrote: > On Tue, Feb 22, 2022 at 12:16:01AM +0000, Jordan Brown wrote: >> It *seems* like it should be possible to come up with a find() that's >> more understandable and as powerful, or at least nearly as powerful.  >> (It would start with treating strings as atomic data items, not as >> arrays of characters, at least by default.) > Hmm. I think this would not be the best idea. > > I would suggest that a string is an array(vector) of characters. > > Now... find would need an argument of "what to find" and an > array of those to find it in. > > so if we define: > find (<needle>, <haystack>); > > then > find ('s', "haystack"); > would find it in the 4th position at index 3. > > and > find ("pear", ["apple", "pear", "banana"]); > > would find it in the second spot index 1. > > I was hoping it would also automatically be able to find substrings, > but writing this convinced me that's something different. Your thinking parallels a discussion that Adrian and I have been having. Our current thoughts on the "find" function are roughly: * find(needle, haystack) returns a single integer, or undef if not found. * find(needle, haystack, some-option) returns multiple matches as a list. o Not clear on what some-option is.  Obvious is something like "limit=n", perhaps with n=0 meaning "all", but n=0 being "all" is a bit wrong and it's not clear that there are real use cases for anything other than "first" and "all".  Maybe "all=boolean", or maybe find() versus findall(). * needle can be any OpenSCAD value for which equality is defined, or a function. * If needle is a function, it's called for each entry in the haystack, and it should return true if the entry matches the desired criteria. o Obscure case:  this disallows searching *for* a function.  If you really want to do that, wrap the function and each of the elements of the haystack in lists, so that you are searching for a list.  Alternatively, use a function to compare with the desired function. * No explicit provision for searching for "any of these".  The use case seems limited, and having a list for "needle" is ambiguous with searching for a list - especially given the equivalence between strings and lists.  Note that you can use a function to implement "search for any of these". * No explicit provision for "search by column".  You can use a function.  Also note that objects, when support is added for user programs to create them, will address a number of the use cases there. * find(char, string) will work as you suggest, because of string-list equivalence, but is not considered a primary goal. * substring searches are out of scope.  (But if there was a separate substring search function, you could use it in a needle function.) (Adrian, did I miss any points?) The case where the needle is a function is very powerful, and can be used to address a lot of less-common cases.  However, it's not clear how fast it is - and it looks slow.  Using list comprehension and "for ... if ..." as a proxy for "find all", it appears that using a function to do a simple comparison may be over a hundred times slower than a simple search(), and only slightly faster than a recursive solution.
RP
Ronaldo Persiano
Fri, Feb 25, 2022 6:30 PM

Although a function as "needle" is powerful (and the only proposed way to
search by column)  I would expect that it would be nearly as fast as a tail
recursion because the function evaluation runtime should be dominant. For
simple functions, the recursion alternative could avoid function calls by
direct evaluation with an additional gain. For find_all() not even a
recursion is needed.

Although a function as "needle" is powerful (and the only proposed way to search by column) I would expect that it would be nearly as fast as a tail recursion because the function evaluation runtime should be dominant. For simple functions, the recursion alternative could avoid function calls by direct evaluation with an additional gain. For find_all() not even a recursion is needed.
JB
Jordan Brown
Fri, Feb 25, 2022 7:19 PM

On 2/25/2022 10:30 AM, Ronaldo Persiano wrote:

Although a function as "needle" is powerful (and the only proposed way
to search by column)  I would expect that it would be nearly as fast
as a tail recursion because the function evaluation runtime should be
dominant. For simple functions, the recursion alternative could avoid
function calls by direct evaluation with an additional gain. For
find_all() not even a recursion is needed.

I ran a few tests, using list comprehension as a proxy for find().

x = [ for (i = [0:99999], j=[0:99]) i+j ];
//y = search(500, x, num_matches=0);
//y = [ for (v = x) if (v == 500) v ];
//f = function(v) v == 500;
//y = [ for (v = x) if (f(v)) v ];

function f2(x, i=0) =
    i >= len(x)
    ? []
    : x[i] == 500
        ? concat(f2(x, i+1), x[i])
        : f2(x, i+1);
y = f2(x);

and here's the results:

Version
Net time
(not including
constructing x)
search
vs no-function
list comp
vs function
list comp
vs
recursive
search
0.3
-
0.07
0.01x
0.006x
no-function list comprehension
4.3
14x
-
0.1x
0.09x
function list comprehension
43.3
144x
10x
-
0.9x
recursive (scaled to 10M)
~50
167x
12x
1.2x
-

The recursive version blew up on the 10M entry table, so I ran it for 1M
entries and scaled up.

Note that the time for search() is so small that it's probably dominated
by noise.  The primary thing to learn from its results is that search()
is really, really fast.

On 2/25/2022 10:30 AM, Ronaldo Persiano wrote: > Although a function as "needle" is powerful (and the only proposed way > to search by column)  I would expect that it would be nearly as fast > as a tail recursion because the function evaluation runtime should be > dominant. For simple functions, the recursion alternative could avoid > function calls by direct evaluation with an additional gain. For > find_all() not even a recursion is needed. I ran a few tests, using list comprehension as a proxy for find(). x = [ for (i = [0:99999], j=[0:99]) i+j ]; //y = search(500, x, num_matches=0); //y = [ for (v = x) if (v == 500) v ]; //f = function(v) v == 500; //y = [ for (v = x) if (f(v)) v ]; function f2(x, i=0) = i >= len(x) ? [] : x[i] == 500 ? concat(f2(x, i+1), x[i]) : f2(x, i+1); y = f2(x); and here's the results: Version Net time (not including constructing x) search vs no-function list comp vs function list comp vs recursive search 0.3 - 0.07 0.01x 0.006x no-function list comprehension 4.3 14x - 0.1x 0.09x function list comprehension 43.3 144x 10x - 0.9x recursive (scaled to 10M) ~50 167x 12x 1.2x - The recursive version blew up on the 10M entry table, so I ran it for 1M entries and scaled up. Note that the time for search() is so small that it's probably dominated by noise.  The primary thing to learn from its results is that search() is really, really fast.
AM
Adrian Mariano
Fri, Feb 25, 2022 10:31 PM

I don't understand what you mean.  Are you saying that you would
expect a function as "needle" to be as slow as tail recursion
because each one is evaluating the same number of function calls?
Nobody is proposing recursion as playing any part in this process.  I
think Jordan just mentioned it for completeness.

The way I see it, the advantage of search() is speed, resulting from
either early termination if you don't ask for all the hits, or
resulting from search() just being a really fast internal function if
you do want all the hits.

If a find() method were to be added as an alternative/replacement then
it needs to be fast.  The formulation using needle() as a function is
elegant because it allows for any case of interest to be run,
including searching on columns or approximate searching.  But it's
only a good thing if it's still fast.  If it's faster for me to
write a hard coded list comprehension then this alternative isn't very
satisfying.

I don't think the takeaway from Jordan's experiments is that search()
is really fast.  The takeaway is that running within OpenSCAD, using
functions is really slow (100x slower than doing the same thing
without function calls).  That's troubling, and may mean that this
idea won't really be practical.

On Fri, Feb 25, 2022 at 1:30 PM Ronaldo Persiano rcmpersiano@gmail.com wrote:

Although a function as "needle" is powerful (and the only proposed way to search by column)  I would expect that it would be nearly as fast as a tail recursion because the function evaluation runtime should be dominant. For simple functions, the recursion alternative could avoid function calls by direct evaluation with an additional gain. For find_all() not even a recursion is needed.


OpenSCAD mailing list
To unsubscribe send an email to discuss-leave@lists.openscad.org

I don't understand what you mean. Are you saying that you would expect a function as "needle" to be as *slow* as tail recursion because each one is evaluating the same number of function calls? Nobody is proposing recursion as playing any part in this process. I think Jordan just mentioned it for completeness. The way I see it, the advantage of search() is speed, resulting from either early termination if you don't ask for all the hits, or resulting from search() just being a really fast internal function if you do want all the hits. If a find() method were to be added as an alternative/replacement then it needs to be fast. The formulation using needle() as a function is elegant because it allows for any case of interest to be run, including searching on columns or approximate searching. But it's only a good thing if it's still *fast*. If it's faster for me to write a hard coded list comprehension then this alternative isn't very satisfying. I don't think the takeaway from Jordan's experiments is that search() is really fast. The takeaway is that running within OpenSCAD, using functions is really slow (100x slower than doing the same thing without function calls). That's troubling, and may mean that this idea won't really be practical. On Fri, Feb 25, 2022 at 1:30 PM Ronaldo Persiano <rcmpersiano@gmail.com> wrote: > > Although a function as "needle" is powerful (and the only proposed way to search by column) I would expect that it would be nearly as fast as a tail recursion because the function evaluation runtime should be dominant. For simple functions, the recursion alternative could avoid function calls by direct evaluation with an additional gain. For find_all() not even a recursion is needed. > > _______________________________________________ > OpenSCAD mailing list > To unsubscribe send an email to discuss-leave@lists.openscad.org
JB
Jordan Brown
Fri, Feb 25, 2022 10:57 PM

On 2/25/2022 2:31 PM, Adrian Mariano wrote:

I don't understand what you mean. Are you saying that you would expect
a function as "needle" to be as slow as tail recursion because each
one is evaluating the same number of function calls? Nobody is
proposing recursion as playing any part in this process. I think
Jordan just mentioned it for completeness.

Right:  today, if you want to do an arbitrary search that stops at the
first hit, you have to use recursion.  (An arbitrary search that
doesn't stop at the first hit is naturally a list comprehension, which
makes me wonder whether that's the answer for "find all".)

So the question was "how would find(function, ...) compare to recursion?".

The way I see it, the advantage of search() is speed, [...]

Even with its warts, I'd say that for the simple cases that it handles,
search() is also a lot easier to read and write than the recursive solution.

If a find() method were to be added as an alternative/replacement then
it needs to be fast. The formulation using needle() as a function is
elegant because it allows for any case of interest to be run,
including searching on columns or approximate searching. But it's only
a good thing if it's still fast. If it's faster for me to write a
hard coded list comprehension then this alternative isn't very satisfying.

Hmm.  There's also the question of what's easier to read and write. 
List comprehensions are a lot easier than recursion, but they aren't as
easy as the basic "find" would be.  Whether they are easier than find
with a function as the needle is less clear.  But a list comprehension
also naturally wants to return all hits, requiring a [0] if you only
want one.

I'm sure that there are plenty of search/find applications that process
gazillions of entries, but I expect that there are also some that
process a dozen entries, and for those simplicity would be paramount.

Now, on the other hand, let me scan my work and count the number of
times that I've used search() in an actual project... um, zero. 
Experiments, three.  (Two of which have small tables, and one of which
can have a large (hundreds of thousands) table.)

On 2/25/2022 2:31 PM, Adrian Mariano wrote: > > I don't understand what you mean. Are you saying that you would expect > a function as "needle" to be as *slow* as tail recursion because each > one is evaluating the same number of function calls? Nobody is > proposing recursion as playing any part in this process. I think > Jordan just mentioned it for completeness. > Right:  today, if you want to do an arbitrary search that stops at the first hit, you have to use recursion.  (An arbitrary search that *doesn't* stop at the first hit is naturally a list comprehension, which makes me wonder whether that's the answer for "find all".) So the question was "how would find(function, ...) compare to recursion?". > The way I see it, the advantage of search() is speed, [...] > Even with its warts, I'd say that for the simple cases that it handles, search() is also a lot easier to read and write than the recursive solution. > If a find() method were to be added as an alternative/replacement then > it needs to be fast. The formulation using needle() as a function is > elegant because it allows for any case of interest to be run, > including searching on columns or approximate searching. But it's only > a good thing if it's still *fast*. If it's faster for me to write a > hard coded list comprehension then this alternative isn't very satisfying. > Hmm.  There's also the question of what's easier to read and write.  List comprehensions are a lot easier than recursion, but they aren't as easy as the basic "find" would be.  Whether they are easier than find with a function as the needle is less clear.  But a list comprehension also naturally wants to return all hits, requiring a [0] if you only want one. I'm sure that there are plenty of search/find applications that process gazillions of entries, but I expect that there are also some that process a dozen entries, and for those simplicity would be paramount. Now, on the other hand, let me scan my work and count the number of times that I've used search() in an actual project... um, zero.  Experiments, three.  (Two of which have small tables, and one of which can have a large (hundreds of thousands) table.)
JB
Jordan Brown
Fri, Feb 25, 2022 11:01 PM

On 2/25/2022 9:26 AM, Jordan Brown wrote:

  • find(needle, haystack, some-option) returns multiple matches as a
    list.
    o Not clear on what some-option is.  Obvious is something like
    "limit=n", perhaps with n=0 meaning "all", but n=0 being "all"
    is a bit wrong and it's not clear that there are real use
    cases for anything other than "first" and "all".  Maybe
    "all=boolean", or maybe find() versus findall().

For the "find all" case, a list comprehension may be an appropriate
alternative.

A list comprehension with an inline condition is ~10x faster, estimated,
than find with a needle function would be.

OTOH, a find with a constant needle might be as much as 10x faster than
a list comprehension.

On 2/25/2022 9:26 AM, Jordan Brown wrote: > > * find(needle, haystack, some-option) returns multiple matches as a > list. > o Not clear on what some-option is.  Obvious is something like > "limit=n", perhaps with n=0 meaning "all", but n=0 being "all" > is a bit wrong and it's not clear that there are real use > cases for anything other than "first" and "all".  Maybe > "all=boolean", or maybe find() versus findall(). > For the "find all" case, a list comprehension may be an appropriate alternative. A list comprehension with an inline condition is ~10x faster, estimated, than find with a needle function would be. OTOH, a find with a constant needle might be as much as 10x faster than a list comprehension.
AM
Adrian Mariano
Fri, Feb 25, 2022 11:52 PM

The BOSL2 code has 55 invocations of search().  It's sometimes hard to
judge how big the search space is from the grep output, but it's true
that a lot of them are small.  A major use case for search which may
not be small is to compute the index of max() or min(), that is,
search([max(foo)], foo, 1)[0].  Many of the uses are small string
searches, or are lookup table search, e.g. for selecting which
polyhedron to produce in the regular_polyhedron module.  Clearly run
time is irrelevant for cases like this.  The code for determining if a
polygon is clockwise uses search, which could be on a few hundred
points.  If you do that a lot in a loop it could add up.  I suppose
another question is what applications would a generic first-hit search
have if it took a function.  How much benefit is there from having a
search function vs having to write a recursive function?  And how
important is run time there.  I was thinking about search problems
like: given a list of points defining a path, find the segment that is
distance L along the path from one end.  Of course, doing this search
requires state, which is a lot more complicated.

On Fri, Feb 25, 2022 at 6:01 PM Jordan Brown
openscad@jordan.maileater.net wrote:

On 2/25/2022 9:26 AM, Jordan Brown wrote:

find(needle, haystack, some-option) returns multiple matches as a list.

Not clear on what some-option is.  Obvious is something like "limit=n", perhaps with n=0 meaning "all", but n=0 being "all" is a bit wrong and it's not clear that there are real use cases for anything other than "first" and "all".  Maybe "all=boolean", or maybe find() versus findall().

For the "find all" case, a list comprehension may be an appropriate alternative.

A list comprehension with an inline condition is ~10x faster, estimated, than find with a needle function would be.

OTOH, a find with a constant needle might be as much as 10x faster than a list comprehension.


OpenSCAD mailing list
To unsubscribe send an email to discuss-leave@lists.openscad.org

The BOSL2 code has 55 invocations of search(). It's sometimes hard to judge how big the search space is from the grep output, but it's true that a lot of them are small. A major use case for search which may not be small is to compute the index of max() or min(), that is, search([max(foo)], foo, 1)[0]. Many of the uses are small string searches, or are lookup table search, e.g. for selecting which polyhedron to produce in the regular_polyhedron module. Clearly run time is irrelevant for cases like this. The code for determining if a polygon is clockwise uses search, which could be on a few hundred points. If you do that a lot in a loop it could add up. I suppose another question is what applications would a generic first-hit search have if it took a function. How much benefit is there from having a search function vs having to write a recursive function? And how important is run time there. I was thinking about search problems like: given a list of points defining a path, find the segment that is distance L along the path from one end. Of course, doing this search requires state, which is a lot more complicated. On Fri, Feb 25, 2022 at 6:01 PM Jordan Brown <openscad@jordan.maileater.net> wrote: > > On 2/25/2022 9:26 AM, Jordan Brown wrote: > > find(needle, haystack, some-option) returns multiple matches as a list. > > Not clear on what some-option is. Obvious is something like "limit=n", perhaps with n=0 meaning "all", but n=0 being "all" is a bit wrong and it's not clear that there are real use cases for anything other than "first" and "all". Maybe "all=boolean", or maybe find() versus findall(). > > > For the "find all" case, a list comprehension may be an appropriate alternative. > > A list comprehension with an inline condition is ~10x faster, estimated, than find with a needle function would be. > > OTOH, a find with a constant needle might be as much as 10x faster than a list comprehension. > > _______________________________________________ > OpenSCAD mailing list > To unsubscribe send an email to discuss-leave@lists.openscad.org