Category Archives: Uncategorized

1 February 2013

  • Present: Lee, Tom, Emma, Kwaku (remotely)
  • Scribe: Tom

NOTE: Next week we will meet from 9 to 11 AM on Friday, Feb. 8th.

Clojure Refactor

Emma has recommended some changes to Clojush, including adding command line arguments. We decided to take an incremental approach to making the changes. More information will come by email.

Uniform Crossover

  • Why: We think uniform crossover may be useful for a variety of reasons, including to combat bloat.
  • Idea: We want the mutability of any one point in the program to be equal.]
  • How: Lee has suggested a crossover / mutation technique that allows each point to be taken from one parent or the other, using either zippers or layers.

Other note: Tom suggested using a linear uniform crossover, where each token (parenthesis or instruction or literal) could be chosen for the swap point. This would allow for a variety of crossover and mutation techniques, many of which could be borrowed from linear GAs (one or two point crossover, uniform crossover, etc.). The only downside is that some sort of repair would be necessary to ensure parenthesis matching after the crossover or mutation.

11 January 2013

Administrative:

meeting time is returned to 1-3 for next week, but beyond that is up in the air

everyone should let Lee know when we are available in the spring semester, and if not, when we will know

 

 

Cluster testing / benchmarking / distributed Clojure:

Should we use a “cloud” clustery thing? … meh

Why isn’t our cluster doing the awesome? We’re not sure, more cores is not appearing to help…

 

If clooj wont work launch stuff from a comandline with java.jar

 

 

Calculator Problem:

selection methods

-these are modal problems, and we need not to avg, but to look at the ones that work well in individual cases

–lexicase selection

–removeing redundant cases

–lexicase with small tournaments to determine which problems are valued

–lexicase which after narrowing down the pool for a correct in one section, will look for other correct combinations

–parito stuff

–start with simple problems and then once they solve you add new aspects to the problems, and you could tie cases together, randomly

–seeding

–lexicase stops the intelligence of evolution regarding prioritizing which part of the problem to chase

 

operators

-amalgamation

–walks through 2 parents and makes a child that roughly takes what is in the mama or papa or neither or both

-parenthesis adder

–can be added anywhere

-tagger

–randomly replaces chunks of code with a tag and moves that code chunk to tagspace

-if or tager

–makes an if or statement for two tags, and then fills them with random crap in the tagspace

 

 

Things to discuss:

  • Weighted lexicase selection

Present Scribe: Micah

4 January 2013

Present : Lee, Tom, Emma, Ian

Scribe : Emma

IMPORTANT ISSUES

We need to address problems that we aren’t solving but need to be. i.e. Kata bowling and the calculator problem Next week’s meeting is from 10am-noon.

99 Problems

Lee – focusing on case redundancy elimination

If we’re driving evolution with software test cases, there are going to be hundreds of cases that are going to be solved correctly with the current population. Result : cases are never actually redundant.

Solution : grouped lexicase selection? The problem with this is that it’s way too slow.

New idea : GENETICS – this has something to do with Lee’s amalgamation operator.

Program Synthesis

Emma is currently working on implementing Sumit’s string processing grammar and learning algorithm, looking for ways in which we can improve upon it, addressing problems that the current implementation cannot solve.

Statistical Possibilities

We’ve been contacted by some folks Lee spoke with at GECCO. They are interested in our work on new metrics and statistical tests. We discussed paper possibilities, including using the bioavailability data. Thus far no news on the contacts.

Visualisation

  • Lee is proposing a visualization project for Ian to work on. The idea is to implement a visual version of push execution state. Previous work was done in processing and incantr. Future work to be done in Quill(?)? The idea here is to have a self-contained project to be incorporated into Clojush.
  • Another visualization project would be to show the tag space. The idea would be to have an animation that shows how tags are being used during execution. Lee would like to see both where tags are used in the tag range, as well as which tags are called during execution. Emma suggested using a heatmap to show usage, where the color represents presence of a tag and intensity represents frequency of hits.

21 December 2012

Present: Lee, Tom, Emma, Micah (GHangout), Kwaku (GHangout)

Scribe: Tom

Note: If you need jobs done soon, you can send them to Emma to have them done on the UMass cluster.

GECCO Ideas

Here are some things that we can do to try to solve the kata bowling and calculator problems.

Selection

Both Lee and Tom will try out weighted lexicase with tournaments and redundancy test case removal.

Weighted Lexicase

The idea here is to give more pressure toward important test cases.

  • Lee’s idea: 2 lists: one random, one sorted “easiest to hardest” by solution rates. Then, 1/2 the time you pick from one list, half the time the other, with removal from both, for the lexicase ordering.
    • Has some problems, such as the “worst” test case getting considerably more attention than the second to “worst” case.
  • Tournament-based weighted lexicase:
    • Run tournaments based on solution rates of test cases. This means you select n test cases at random, and then select the test case with the lowest solution rate to go next in the lexicase ordering. Repeat, with removal, over the test cases until all are in the list.
    • Seems reasonable. Both Tom and Lee will try this out.

Clustering / Redundancy Test Case Removal

The idea here is the reduce the importance of similar test cases.

  • Lee’s Redundant Test Case Removal
    • Idea: If two (or more) test cases create the same exact ordering amongst the population, then treat them as 1 test case.
    • In initial experiments of the calculator problem, this resulted in some test cases being grouped together in the first few generations, but after that all 34 test cases reacted exactly the same.
    • Same thing happened when only looking at the elite set for each test case.
    • But, these runs were done when not using the measure during selection; we might see different results if we’re using the measure.

Tom thinks other clustering algorithms might also be worth trying, though he agrees with Lee that these might introduce annoying parameters / other issues.

Genetic Operators

Lee has been trying some things:

  • Mutation:
    • Tagging mutations – replace some code with a tagged_# call, and put (tag_#) at the beginning of the program.
    • Paren adding mutations
  • Crossover:
    • Complementary lexicase mate selection
    • Amalgamation

Instructions

It may be worthwhile to experiment with different instructions / new instructions / combinatorics of instructions.

14 December 2012

Present: Lee, Emma, Tom, Kwaku, Josiah, Tom, Omri

Scribe: Emma

Scheduling

  • Next meeting – 21st at Lee’s house
  • No Meeting on the 28th
  • Next Meeting : January 4th, usual time and place

Grant

Submission to an NSF software engineering grant – continue doing what we’re already doing, but with more of an emphasis on introduction to programming problems, rather than agents

Papers

With Kata Bowling not working out, submissions for GECCO are unclear.

Benchmarking

  • Josiah wanting to check out price per performance over 5 years for hardware
  • Had been making the assumption that multicore machines were improving performance
  • Lee’s runs are deterministic; Tom’s are not
  • In many cases, multithreaded = hit
  • problem seems to be inherent in Clojure
  • Josiah is checking out the numbers with a company that designs high performance JVMs

7 December 2012

Scribe: Tom

Attending: Lee, Tom, Emma, Josiah

Benchmarking the Cluster

Lots of weird stuff is going on. See emails for more information. Josiah is going to try lein 2, map, and removing parallel garbage collection.

Memory issues on Cluster

We’re getting weird things with out of memory errors. For now, Tom is only going to use racks 1 and 2, which haven’t given any issues.

GECCO Paper Things

COSMOS paper followup is on hold.

Calculator

Lee’s going to try some things to figure this out:

  • Weird genetic operators
  • Big instruction crutches (like mult_by_10) to see if it can be solved.
  • Another big crutch could be taking out all math and see if it’s solved. If not, then there’s something more fundamental going on.
  • Maybe try staging of test cases from easy to hard.

Kata Bowling

Tom will try some things here as well:

  • Validation set to test near-solutions, to see if they’re close to general or just memorizing solutions.
    • If they aren’t generalizing, it might be worthwhile to do something more interesting with test cases such as changing which test cases are used every (few) generation(s).
  • Maybe try bigger instruction crutches here too.

This led to a discussion of burn-in period that could be used before evolution to make sure the initial guys are sufficiently good.

30 November 2012

Details

Scribe: Emma Present: Lee, Micah, Tom, Zeke, Kwaku, Emma Reading for next week: None

Agenda

Tom wants to discuss GSXover random code generation and how we can fix it. It might be a good idea to re-skim the emails related to this if you wish to be involved in the discussion.

Lee’s updates

  • Funding – current grants, upcoming projects, different directions
  • New focus back on the calculator problem, modality, modularity, concepts related to our recent work using GSX, etc.
  • number pressing to number, boolean output
  • run program once to establish tag space
  • map of button to tag (hard-coded) -> each button has a particular tag value
  • tagging, boolean, float, exec stacks
  • how wrong is it -> need a gradient, rather than 0-1 loss
  • due to tag spaces, new genetic operators like “concatenation” might be useful. new one called “amalgamation’
  • TODO: investigate tradeoffs between redundancy and modularity (Kwaku’s done much recent work on this)
  • refactoring

Micah’s update

Kwaku’s update

<super secret things about modularity>

Tom’s update

  • kata bowling runs – searching for consistency in what’s causing some runs to do well and what’s causing some to perform poorly
  • gsxover – should we continue working on this? if yes, how can we make it work?
  • tom’s recent work has raised the probability of getting something successful

– added scope -> lee suggesting a scope macro/interface -> how to get adfs without explicitly having adfs -> connection with mutation? – changed scope (specific function set for random code in GSX)

Zeke’s update

  • validation, things related to the Domingos paper Lee forwarded.
  • held up by chemdoodle release delay

16 November 2012

Meeting Notes

next week we need to have a meeting, ONLINE!!!!

Gecco deadline is January we should have results and bits of text by next week

“Man will do anything in order to avoid the true difficulty of thinking” -proverb by Lee

Go Zeke’s publication for Chemistry!

Options with Ranjan

Do linear regression guided by Ranjan on problems in chemistry that we have no idea about

or we could try and evolve a material (organic or inorganic), which is evolved with fabrication and physical testing

– takes about two days to do each generation

– we need to focus on making each result good, rather then a hodgepodge of garbage with gems in it

– Fitness prediction would be nice!

– in between: simulate enzymes and run GP on them, no physical testing

 

Ranjan’s STUFFFFF

-mechanistic deterministic molding of biological and chemical systems

– has a teammate (Chris Corneleus) who can do polymers and stuff, but Ranjan doesn’t have supplies to do that

– there’s a big initiative to do materials stuff called “materials genome”

– Weird stuff happens when you mix polymers, it is not always the geometric mean, definitely non-intuitive and non-linear

– It is mostly explored with intuition

– applications range: water filtration, fuel stuff,

– we want problems where “the proof is in the pudding”, so this is bawler

– we would be making recipes for these polymers

– the throughput is hella low throughput, so we need to make sure that it is unusually likely to land you in good places of a searchspace, and mutates/crosses you over towards that

– note we are evolving every part of the recipe

– based on experience manipleting the temperature matters a lot

– the starting components and processing are both incredibly important (particularly looking at water filtration)

– the tests would be assessment of water purity, flow output (diffusion of the polymer), swelling of polymers

– so non-linear that regression sucks like hell, the systems are multidimensional and non-linear

– ok, not nessisairily the case that symbolic regression has not been tested with it

– there are weird metrics like torchuosity, which is a metric of how strangely a molecule moves

– potentially be useful to run grammatical evolution or developmental evolution (so the programs can be weird, by changing stuff and such things, but the starting seed is always good)

– the seed formula is take two material mix them, heat them, cool them

– getting the max yield from a small sample size is very, very, very important

– check out evolutionary robotics field

– simple story there is a shit ton of fitness cases, and then they look at situations in which the population varies significantly and then only looks at those fitness cases

– unfortunately that is not a perfect match

– josh bongaurd used a simulation for most of the tests, but would occasionally do a real life test which they would use to improve the simulation

– we could try to evolve the simulator using fitness predictors

-yay, there’s a lot of data already!

– there is a lot of proprietary going into parrot front

– there’s some data already

-probs at least on the scale of a few hundred

– that’s enough to at least to start

– making something to predict efficacy of future recipes based on current recipes and their results

– what step was x or y

– try lots of heuristics

– want to make sure we give the real test predicted solutions which are distinct, not just successful ones

– do we keep individuals from the last test in the new test

– Solubility is unpredictable, simulated spectra would be nice, we could do it, there’s also a shit ton of data on it already

– we could do attempt something to do with enzymes and protein folding and doing predictions (sounds scary)

– what if we could make something sufficiently good to do a denovo

– in protiens there is a sequence space, a shape space and a function space, they’re hella weird to predict and are not always super successful attempts

– they’re a bunch of stuff which suggests that shapespace may not even exist

-would this be from data or simulation

– it’s not got much data, but if you choose the right problem it could work

 

Lee’s writing on the board: zee’s writing – we start with data that goes like { recepe: measurement }, …

we have a population of R:M pairs

we evolve a function that given an R gives an M

we need to figure out how to chew up R

that fitness predictor is used in another found of evolution

where we evolve programs that change a recipe to maximize it’s fitness in measurements

then we physically test a hundred of them, selected somehow

now we have a new (larger) set of RM pairs
-a shit ton of ideas about how this should work

– fitness predictors (how many, do we do lexicase selection, do we do multiple runs)

– should we choose a diverse set of programs from the fitness predictor set (to clarify one’s that are distinct and will produce interesting darts on the search space)
what are scopes good for?

gsxo? kata bowling?

when you don’t have scopes it seems clearly like a bad idea to do something that operates not the whole stack.

code map does crazy shit

could also be used for strings

Pre-Meeting Agenda

  • Ranjan Srivastava will be joining us

9 November 2012

  • Scribe: Tom
  • Present: Lee, Tom, Emma, Zeke, Micah
  • Reading for next week: none
  • Next week Ranjan Srivastava will be joining us

Contents

[hide]

Pre-Meeting Agenda

  • Practice talk for Zeke
  • Return to paper: learning programs: a hierarchical baysian approach (link on previous lab meeting)
  • Emma’s publication prospects
  • Scope — we want to have more focused conversation next week
  • Trivial geography in Zeke’s Javascript GP
  • Review upcoming paper list
  • GSXover Random Code – how to fix this
    • Scope?
    • GSX-specific instruction set?

Meeting

  • Zeke’s Javascript trivial geography bug – might be useful, but for now, just get rid of it
  • Zeke’s practice presentation

Emma’s Paper Suggestions

  • Machine Learning stuff – vague possibilities here
  • COSMOS – maybe compare to other statistical tests
  • TSM – different potential here. Also, not clear how to make it work well

GSXover

  • Using different random code generators (or different percents) may be worthwhile here

Scope in Push

  • We don’t want to change the usage or meaning of parentheses
  • Lee gave his proposal, which was partly new and partly reflected Tom’s and Omri’s suggestions
  • After discussion, we agreed on a proposal. This was implemented by Tom, and the documentation can be found here: Clojush Environments Documentation

02 November 2012

ci lab notes 11/2

Scribe : Zeke

LEE PAY ATTENTION: SCOPE!

AGENDA FOR NEXT WEEK:

  • practice talk for zeke
  • return to paper: learning programs: a hierarchical Bayesian approach (link on previous lab meeting)
  • emma’s publication prospects
  • Scope — we want to have more focused conversation next week
  • review upcoming paper list

Lee wants low-weight debugging backtrace tool that looks at locals

Tom’s stuff

  • Tom has unpublished work
  • Kata bowling
    • a paper on how others can’t solve it
    • assuming we solve it that’d be cool
    • one of 7 things might make it work
      1. lexicase in some capacity
      2. geometric semantic xover (could be a totally different paper)
      3. tags (perhaps we’re not using enough, “homeopathic amounts…”
      4. scope
      5. uniform genetic operators — Micah — later in terms of publications
      6. weighted lexicase selection
      7. age layered population structure — probably not on the front burner