Author Archives: dnh10

16 November 2012

Meeting Notes

next week we need to have a meeting, ONLINE!!!!

Gecco deadline is January we should have results and bits of text by next week

“Man will do anything in order to avoid the true difficulty of thinking” -proverb by Lee

Go Zeke’s publication for Chemistry!

Options with Ranjan

Do linear regression guided by Ranjan on problems in chemistry that we have no idea about

or we could try and evolve a material (organic or inorganic), which is evolved with fabrication and physical testing

– takes about two days to do each generation

– we need to focus on making each result good, rather then a hodgepodge of garbage with gems in it

– Fitness prediction would be nice!

– in between: simulate enzymes and run GP on them, no physical testing

 

Ranjan’s STUFFFFF

-mechanistic deterministic molding of biological and chemical systems

– has a teammate (Chris Corneleus) who can do polymers and stuff, but Ranjan doesn’t have supplies to do that

– there’s a big initiative to do materials stuff called “materials genome”

– Weird stuff happens when you mix polymers, it is not always the geometric mean, definitely non-intuitive and non-linear

– It is mostly explored with intuition

– applications range: water filtration, fuel stuff,

– we want problems where “the proof is in the pudding”, so this is bawler

– we would be making recipes for these polymers

– the throughput is hella low throughput, so we need to make sure that it is unusually likely to land you in good places of a searchspace, and mutates/crosses you over towards that

– note we are evolving every part of the recipe

– based on experience manipleting the temperature matters a lot

– the starting components and processing are both incredibly important (particularly looking at water filtration)

– the tests would be assessment of water purity, flow output (diffusion of the polymer), swelling of polymers

– so non-linear that regression sucks like hell, the systems are multidimensional and non-linear

– ok, not nessisairily the case that symbolic regression has not been tested with it

– there are weird metrics like torchuosity, which is a metric of how strangely a molecule moves

– potentially be useful to run grammatical evolution or developmental evolution (so the programs can be weird, by changing stuff and such things, but the starting seed is always good)

– the seed formula is take two material mix them, heat them, cool them

– getting the max yield from a small sample size is very, very, very important

– check out evolutionary robotics field

– simple story there is a shit ton of fitness cases, and then they look at situations in which the population varies significantly and then only looks at those fitness cases

– unfortunately that is not a perfect match

– josh bongaurd used a simulation for most of the tests, but would occasionally do a real life test which they would use to improve the simulation

– we could try to evolve the simulator using fitness predictors

-yay, there’s a lot of data already!

– there is a lot of proprietary going into parrot front

– there’s some data already

-probs at least on the scale of a few hundred

– that’s enough to at least to start

– making something to predict efficacy of future recipes based on current recipes and their results

– what step was x or y

– try lots of heuristics

– want to make sure we give the real test predicted solutions which are distinct, not just successful ones

– do we keep individuals from the last test in the new test

– Solubility is unpredictable, simulated spectra would be nice, we could do it, there’s also a shit ton of data on it already

– we could do attempt something to do with enzymes and protein folding and doing predictions (sounds scary)

– what if we could make something sufficiently good to do a denovo

– in protiens there is a sequence space, a shape space and a function space, they’re hella weird to predict and are not always super successful attempts

– they’re a bunch of stuff which suggests that shapespace may not even exist

-would this be from data or simulation

– it’s not got much data, but if you choose the right problem it could work

 

Lee’s writing on the board: zee’s writing – we start with data that goes like { recepe: measurement }, …

we have a population of R:M pairs

we evolve a function that given an R gives an M

we need to figure out how to chew up R

that fitness predictor is used in another found of evolution

where we evolve programs that change a recipe to maximize it’s fitness in measurements

then we physically test a hundred of them, selected somehow

now we have a new (larger) set of RM pairs
-a shit ton of ideas about how this should work

– fitness predictors (how many, do we do lexicase selection, do we do multiple runs)

– should we choose a diverse set of programs from the fitness predictor set (to clarify one’s that are distinct and will produce interesting darts on the search space)
what are scopes good for?

gsxo? kata bowling?

when you don’t have scopes it seems clearly like a bad idea to do something that operates not the whole stack.

code map does crazy shit

could also be used for strings

Pre-Meeting Agenda

  • Ranjan Srivastava will be joining us

9 November 2012

  • Scribe: Tom
  • Present: Lee, Tom, Emma, Zeke, Micah
  • Reading for next week: none
  • Next week Ranjan Srivastava will be joining us

Contents

[hide]

Pre-Meeting Agenda

  • Practice talk for Zeke
  • Return to paper: learning programs: a hierarchical baysian approach (link on previous lab meeting)
  • Emma’s publication prospects
  • Scope — we want to have more focused conversation next week
  • Trivial geography in Zeke’s Javascript GP
  • Review upcoming paper list
  • GSXover Random Code – how to fix this
    • Scope?
    • GSX-specific instruction set?

Meeting

  • Zeke’s Javascript trivial geography bug – might be useful, but for now, just get rid of it
  • Zeke’s practice presentation

Emma’s Paper Suggestions

  • Machine Learning stuff – vague possibilities here
  • COSMOS – maybe compare to other statistical tests
  • TSM – different potential here. Also, not clear how to make it work well

GSXover

  • Using different random code generators (or different percents) may be worthwhile here

Scope in Push

  • We don’t want to change the usage or meaning of parentheses
  • Lee gave his proposal, which was partly new and partly reflected Tom’s and Omri’s suggestions
  • After discussion, we agreed on a proposal. This was implemented by Tom, and the documentation can be found here: Clojush Environments Documentation

02 November 2012

ci lab notes 11/2

Scribe : Zeke

LEE PAY ATTENTION: SCOPE!

AGENDA FOR NEXT WEEK:

  • practice talk for zeke
  • return to paper: learning programs: a hierarchical Bayesian approach (link on previous lab meeting)
  • emma’s publication prospects
  • Scope — we want to have more focused conversation next week
  • review upcoming paper list

Lee wants low-weight debugging backtrace tool that looks at locals

Tom’s stuff

  • Tom has unpublished work
  • Kata bowling
    • a paper on how others can’t solve it
    • assuming we solve it that’d be cool
    • one of 7 things might make it work
      1. lexicase in some capacity
      2. geometric semantic xover (could be a totally different paper)
      3. tags (perhaps we’re not using enough, “homeopathic amounts…”
      4. scope
      5. uniform genetic operators — Micah — later in terms of publications
      6. weighted lexicase selection
      7. age layered population structure — probably not on the front burner

19 October 2012

Scribe: Omri

Present: Lee, Micah, Omri, Emma, Tom, Kwaku

Administrativia

  • There will be no meeting next week, but Lee will have open office hours after 3pm. Next meeting will be the week after, November 2 from 1-3pm.
  • At the meeting on November 2, we will be meeting with Ranjan Srivastava (from UConn–whom Lee met on the train leaving GECCO).
  • Also at the meeting on November 2, we will do a more thorough discussion of Learning Programs : A Hierarchical Bayesian Approach, which had been scheduled for today, but we didn’t have enough time for.
  • Despite not meeting next week, Lee would like everybody to submit very realistic publication ideas and plans by next Friday.
  • For next year’s GECCO, Lee doesn’t yet know what sort of funding for travel will be possible.

News

  • There is a collaboration possible with William LaCava at UMass on a MatLab based project.
  • Lee met with Adam Kalai, from Microsoft Cambridge (he works with Sumit Gulwani), who had interesting things to say about program creation by example. Reading some of his papers may uncover good problems for us to try and tackle using automatic software creation.
  • Parent reversion is not a part of Clojush. @Lee: parent reversion should change such that an unimproved child will always survive if it is smaller.
  • Lee, Omri, and Kwaku may collaborate to propose a workshop for GP using stack-based languages.
  • Jon Klein seems have resurfaced, and has written a version of Push for Ruby. He wants assertion tests that are themselves Push instructions, which would leave true on the boolean stack if the assertion succeeds, false otherwise.

Discussion

  • We discussed how to manage the size explosion of GSXover, leaving with the ideas that we could try increasing the iterations of auto-simplification; decreasing the rate of GSXover; increasing parent reversion; and/or implementing the new deletion operator.
  • We briefly discussed Learning Programs : A Hierarchical Bayesian Approach
  • Flea Market may only need explanation/examples/documentation for now. There may be pathologies that will surface, but hopefully they would be trivial to fix.
  • We briefly discussed scoping for Push, but ultimately decided to have the real discussion take place through email.

12 October 2012

Scribe : Emma

Present: Lee, Micah, Zeke, Tom, Emma, Kwaku

Administrativia

Lee – pushing things incrementally forward; potential meeting with Sumit Gulwani’s colleague in Boston; meeting one of Kourash Dani’s new students; corresponding with David Clark (tree GP – makes more sense than Push), will try using lexicase selection on it, geometric semantic crossovers things to add to Clojush (random on failure to combat reproduction default when crossover fails due to child size, i.e. fail to random instead of mom)

Tom’s suggestion for handling bloat: use simplification operator BEFORE fitness evaluation.

David Clark is interested in a new crossover operator that swaps the chunks of code that are crossed over.

Status Stuff

Status updates were framed primarily by issues of bloat and a desire to find better operators that either adapt to the state of the system or adapt to the particular parameters of the problem. In unrelated updates, Zeke is doing a Public Health hackathon and wanted to know if anyone had suggestions for uses of GP in this context.

Tom: Geometric Semantic Crossover gets really big really fast

Micah: Discuss places where humans are systematically outperformed by algorithms. Perhaps these are places that would be a good place to attempt to produce better algorithms for with GP.

Lee – implementing Family Feud

Tom – implementing deletion operators; having problems with random code generation (should perhaps be problem-specific) -> this is particularly problematic for language control structures

Lee (in response to some of Tom’s concern) – We should seriously consider implementing a notion of scope -> using tag space machines to capture scope (where the tag space is a stack of frames) -> maybe something halfway between push and tsm?

Consider the fact that not having scope makes it considerably more difficult to evolve larger programs.

Problems with variation operators – as a group, we’re really leaning towards

Zeke – interested in adaptive operators to handle plateaus

Issues with simplification – it’s really a mutation-delete operator.

Paper for next time

Learning Programs : a Hierarchical Bayesian Approach

5 October 2012

  • Scribe: Tom
  • Members Present: Lee, Tom, Emma, Micah, Zeke, Omri (digitally)

    Pre-Meeting Agenda

    • UConn contact contacted
    • Lein 2?
    • web gp / push / javascript gp applications
    • bioavailability … bug?
    • lexicase + semantic xover on mux, valiant
    • database for visualization work (?)
    • uniform subtree mutation etc.
    • size-based tournaments future paper?

    Meeting Notes

    Administrivia

    • Put things you want to talk about it next week’s meeting notes.
    • Lein1 vs. Lein2 – it appears that Lein2’s deps is not compatible with Clooj, so use lein1 if you need to pull deps for Clooj.
    • It would be great if someone can figure out how to run Clojure code, hit an exception, and be able to examine the state and the variables. It would be preferable if this did not include using emacs, but it’s fine if it does.

    Lexicase + Geometric Semantic Genetic Operators

    • Good success on MUX-6, though Tom notes that this problem seems ideal for this method
    • Les Valiant boolean problem
      • Lee is trying a 50 out of 100 version currently
      • Programs get big, even with simplification
    • Lee should implement general operators

    Size-based Tournaments

    • We can use the GECCO 2012 poster results as a beginning / reason for using 2/2
    • Test on 1 or 2 hard problems in ECJ
    • Maybe do some Clojush runs, but only for our benefit and not for the paper

    Database for Visualization

    Zeke, Tom, Emma, and Josiah will be working on this in various capacities.

    Web GP / Javascript

    Possible uses include a Push demo that could be an interactive interpreter or a demo of PushGP.

    Cell Phone Load Management

    This is a potential full app for an Android phone. Maybe worth pursuing?

    Uniform subtree mutation and crossver

    • Micah will try using Lee’s tree GP.
    • If we want to publish results, it should be relatively easy to implement in ECJ.

28 September 2012

Reading: File:Nchembio.689.pdf Efficient discovery of anti-inflammatory small-molecule combinations using evolutionary computing – Zeke to lead

Administrativa

  • Merging updates with master: Since I (Tom) am pushing a lot of Clojush updates recently, I’d be happy to do the actual merging etc. as well. This would probably involve Lee giving an ok to the merge before it happens.

Things Tom wants to Discuss

  • Bloat theory and uniform operators
  • Real-time graphing and statistics of cluster runs.

Notes:

article:

   this paper was about innate immunity rather than that which you develop in response to pathogens
   There are certain white bloodcells that just respond to patterns of bad.
   Ther white blood cells they looked at respond to certain things and signal other cells to act with IL1beta
   The drug company wanted to use drugs synergistically to control IL1beta, because using a lot will kill you
   they had 33 drugs and wanted to combine them in order to attempt to produce a specific amount of IL1betta in response to LPS
   note that the drugs can kill the macrophages, so they were trying to get low macrophage death for low IL1betta production
   they used a simple binary GA with the fitness in the real world
       the trick to it is the way you treat multiple objectives for fitness in parents. There's a lot of talk about hypervolume in attempting to improve this. Most are pareto based. This one wasn't.

LEE!!! ~ Contact UCon guy. Invite him to come discuss projects with us.

UMASS may have automated labs that could do the wet side of some cool gp stuff.

we should have a blackboard of ideas —>> we could build a space —>> we could have a google doc, Oh, ya great. Coool, now we have one.

Experiment idea: Lexicase selection is a way of making parents that are talented as specific aspects of problems. Similarly if any parent that is capable of solving a novel combination of answers it should be valued. Perhaps scaled lexical selection should be pursued. The two together (lexical and gsgp) could be mashed up so they might hit on answers and then blend success.

quasi-gsgp where instead of (if ___ A B), you do (if ______ (chunk A) (chunk B)) even better if you can do it intelligently. The goal of this is to use gsgp ideas to increase modularity, rather than acting towards hitting the geometric mean.

You can have a GP program which is a bunch of numbers and they follow lists choosing options and diving down to deeper lists.

code could be executed on it’s own scope <— relating to chunk gsgp

why do people mutate so stuff adds and removes in the same spot in push, why even do both?

there are some papers which believe they have conclusively stated that bloat as defense against cross over and mutation have been disprove. Lee doesn’t believe them. There’s lot’s of stuff that causes bloat, it’s context dependent.

Thomass and Zeke: a web page that is displaying graphs and logfiles of the generations as they’re run

   currently only outputting population data
   should test to see how much it's slowing things down
   you could link children to parents to get genealogies

read File:Uniform mutation.pdf

21 September 2012

Present: Lee, Tom, Zeke, Micah, Kwaku, Emma
Administrativia

– Must send updates early! (This is for everyone)

– Clojush macro issue: Tom’s solution seems to be working; it will be pushed with the oral bioavailability code

– Readings: whosoever suggested the reading, leads and will be expected to have a greater depth of understanding. Everyone else is expected to at least skim and have a high level view of the paper.

– The Valiant Challenge: Lee is still working on it, knows what to try next, but may have a conflict with Valiant over the definition of evolution

– Emphasis on escalating research to journal articles and making incremental progress on all projects.
File:Operator Equalisation, Bloat and Overfitting – A Study on Human Oral Bioavailability Prediction.pdf discussion

– New sample problem + new technique

– Possible future task: apply Silva work in Push, leveraging ease with which programs of varying size can be made using Push

– do an experiment/analysis of whether bloat really does protect against cross over (possible literature review?)
TSM

– mutation -> consider making uniform; four types: (1) deletes entry, (2) inserts entry, (3) mutate tag, (4) mutate value -> use the same probabilities as the initial random tag mutation

– toggle between eda and improving the machine

14 September 2012

Tom wants to discuss:

  • Hardness-biased lexicase selection
  • Web-based graphs of cluster runs

IN MEETING NOTES CI Lab notes

Administrative

  • none

Talking about clojush speed

  • Lee codes for readability overall
  • profiling can be tricky
  • No one really wants to take this on, but someone should do this…

Talking about paper

  • Lee summarizes
  • We discuss
  • we could use this in clojush .. might work well with lexicase selection
  • POSSIBLE TASK: Implement semantic crossover in clojush with lexicase selection
  • POSSIBLE TASK: differentiate a few code size limits and make more parameters
  • POSSIBLE TASK: improve simplification by testing parts of code for constants.

7 September 2012

Scribe: Tom

Paper to read for next week: Geometric Semantic GP. Feel free to skim the sections that get very mathy.

Fall Membership

  • Members now include: Lee, Tom, Emma, Zeke, and Micah, with Kwaku and Omri being maybes. Someone should probably change the main lab page to reflect this.
  • Lee should pester Kwaku and Omri about their membership.

Tag Space Machines

  • So far, we aren’t seeing much evolutionary progress.
  • Most initial programs are probably too small. We’ll focus for now on generating a viable generation 0, so that evolution has something to work with.

Uniform Mutation and Crossover

We got sidetracked by a large discussion about uniform mutation and crossover, both in TSMs and in PushGP. This may definitely be worth pursuing further in Push.

  • Option 1: Set a mutation percentage, and then each point independently has that probability of being mutated.
  • Option 2: For a program of size N, mutate (mutation percent * N) points within the program.
  • Either of these options may also work with crossover, where the inserted code is taken from the “father” instead of created from scratch.
  • These are also potentially doable in tree GP, which would be of higher interest to the community at large.

Kata Bowling

  • Tom has been working on some graphing things.
    • Heatmaps of best individuals of a run over all test cases.
    • Piano roll plots of fitnesses and sizes within a run.
  • What can we do to solve this problem?
    • Some sort of combination of historically assessed hardness and lexicase selection may be useful. This could add bias to have harder test cases near the beginning of the sorted list of test cases for lexicase selection.

Geometric Semantic GP

  • Singles out good features of parents and combines them (somehow).
  • Problem: This creates a lot of bloat.
  • Solution? Maybe use Push with simplification.

Leslie Valiant

If we can solve the 500-500 boolean parity problem with GP, that would be very good!