Theory of Computing Blog Aggregator

In this paper we study the degree of non-constant symmetric functions $f:\{0,1\}^n \to \{0,1,\ldots,c\}$, where $c\in \mathbb{N}$, when represented as polynomials over the real numbers. We show that as long as $c < n$ it holds that deg$(f)=\Omega(n)$. As we can have deg$(f)=1$ when $c=n$, our result shows a surprising threshold phenomenon. The question of lower bounding the degree of symmetric functions on the Boolean cube was previously studied by von zur Gathen and Roche who showed the lower bound deg$(f)\geq \frac{n+1}{c+1}$ and so our result greatly improves this bound. When $c=1$, namely the function maps the Boolean cube to $\{0,1\}$, we show that if $n=p^2$, when $p$ is a prime, then deg$(f)\geq n-\sqrt{n}$. This slightly improves the previous bound of von zur Gathen and Roche for this case.

at March 10, 2010 11:04 AM UTC

Consider the following two-player communication process to decide a language $L$: The first player holds the entire input $x$ but is polynomially bounded; the second player is computationally unbounded but does not know any part of $x$; their goal is to cooperatively decide whether $x$ belongs to $L$ at small cost, where the cost measure is the number of bits of communication from the first player to the second player. For any integer $d \geq 3$ and positive real $\epsilon$ we show that if satisfiability for $n$-variable $d$-CNF formulas has a protocol of cost $O(n^{d-\epsilon})$ then coNP is in NP/poly, which implies that the polynomial-time hierarchy collapses to its third level. The result even holds when the first player is conondeterministic, and is tight as there exists a trivial protocol for $\epsilon = 0$. Under the hypothesis that coNP is not in NP/poly, our result implies tight lower bounds for parameters of interest in several areas, namely sparsification, kernelization in parameterized complexity, lossy compression, and probabilistically checkable proofs. By reduction, similar results hold for other NP-complete problems. For the vertex cover problem on $n$-vertex $d$-uniform hypergraphs, the above statement holds for any integer $d \geq 2$. The case $d=2$ implies that no NP-hard vertex deletion problem based on a graph property that is inherited by subgraphs can have kernels consisting of $O(k^{2-\epsilon})$ edges unless coNP is in NP/poly, where $k$ denotes the size of the deletion set. Kernels consisting of $O(k^2)$ edges are known for several problems in the class, including vertex cover, feedback vertex set, and bounded-degree deletion.

at March 10, 2010 12:13 AM UTC

Authors: Vamsi Kundeti, Sanguthevar Rajasekaran, Hieu Dinh
Download: PDF
Abstract: Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories -- based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice.

Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In Jackson et. al. ICPP-2008, an $O(n/p)$ time parallel algorithm has been given for this problem. Here $n$ is the size of the input and $p$ is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating $\Theta(n\Sigma)$ messages.

In this paper we present a $\Theta(n/p)$ time parallel algorithm with a communication complexity equal to that of parallel sorting and is not sensitive to $\Sigma$. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of $\Theta(\frac{n\log(n/B)}{B\log(M/B)})$. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with that of Jackson et. al. ICPP-2008 reveals that our algorithm is faster. We also provide efficient algorithms for the bi-directed chain compaction problem.

at March 10, 2010 01:30 AM UTC

The news has hit the wires -- Chuck Thacker is this year's Turing Award winner.

I had the great pleasure of getting to know Chuck while I worked at DEC SRC.  He's a character, a tinkerer, and a great and curious mind.  I think recognizing his work -- the Alto -- is a great choice.

 

by Michael Mitzenmacher (noreply@blogger.com) at March 09, 2010 04:19 PM UTC

A couple of commenters have asked what software package I use to create the graphs that appear in bit-player posts–illustrations like the one below, which is a slightly improved version of something I posted last week. Let’s call it Figure 1.

rms-graph2-revised.png

Prompted by these inquiries, I immodestly ask myself: Why do my graphs look so darn good? I immodestly answer: It’s not because of any packaged software! I don’t need a cake mix, or even a recipe. These are home-baked graphs, made from scratch out of locally grown organic pixels.

I have strong opinions about the aesthetics of scientific illustrations, and I could certainly spout off about the design elements of Figure 1, such as that putty-colored background, just dark enough to allow drop-out white grid lines, yet neutral enough to avoid competing with the data curves, which also have a distinctive color scheme on which I could discourse at length. Yes, I can talk the Tufte talk. But I think the commenters were really asking how I create the graphs rather than why they’re so elegant, and so I’m going to focus here on the practical programming problem.

Most of my experience in drawing pictures with a computer comes from the world of print publishing, where the final product is ink on paper rather than pixels on a screen. Compared with the online environment, print has some advantages, notably higher resolution (up to 1,000 dots per centimeter) and precise control over typography and color. But print also has obvious limitations: On a magazine page, there are no mouseovers or clickable buttons, and you can’t make a square knot twirl in 3D.

Thirty years ago, the big challenge for computer-generated illustrations was not how to draw the picture but how to get it out of the computer and onto the printing press. You couldn’t just export a PDF and place it in a Quark or InDesign document; none of those things existed. The only practical option was to print out the artwork, photograph it, and “strip” the negative into the page-size film that would be used to make the press plate. Because of this emphasis on printouts, most of the effort went into programming the printer rather than the computer.

The figure below is the first published computer-generated illustration I had a hand in creating. It appeared in Scientific American in 1983.

epson-freq-table.png

The array of 282 tiny bar graphs was produced with an Epson MX-80 dot-matrix printer, using escape codes to fire combinations of the eight pins in the printhead. Of course the MX-80 was a black-and-white device. The two-color illustration was created from two separate printouts. Also, the Epson letterforms were replaced with typeset characters.

The world of computer-generated illustrations changed dramatically with the arrival of PostScript, the “page description language” created by John Warnock and his colleagues at Adobe Systems (based in part on earlier work at Evans and Sutherland and Xerox PARC). PostScript was designed as a complete programming language rather than just a file format or a set of drawing commands. And something else set it apart as well: attention to details of graphic design. With most earlier software (such as programs based on the Apple Quickdraw library), trying to create publishable figures was an exercise in frustration. For example, the apparent weight of a line would vary depending on its orientation: lighter when vertical or horizontal, heavier when diagonal. PostScript allows very precise control over such niceties of presentation. To take another example, where lines meet the edge of a graph, you don’t want to have to choose between falling short and overshooting; PostScript provides the tools needed to make it look right.

edge-effects.png

(The version in the rightmost panel is created by allowing the colored lines to extend outside the background box, and then applying a clipping mask that cuts off all objects at the boundary of the box.)

Obsessing over minute details like these may seem comically fussy, but I believe that neatness counts in these matters. To some extent, illustration is an art of illusion. Graphs and diagrams work best when you can look through them rather than at them. The viewer should be seeing the underlying information or abstraction–the array of correlation coefficients, the function y = f(x), or whatever–rather than noticing the mechanics of how the drawing was constructed. A ragged edge is the kind of distraction that destroys the illusion.

Although PostScript was a giant step forward from the MX-80 command set, in the early years it was still just another printer language, not a computer language. The only way I could execute a PostScript program was to send it to a laser printer and wait to see what came out. Sometimes it was a long wait. I had no way of running a PostScript program on the computer itself. (Ghostscript came later.)

ChernoffFaces.pngMy first PostScript illustrations were created as hand-written PostScript programs; the same language was used both for doing the computations and for presenting the results. The faces at right were created in this way. (They were inspired by the work of Herman Chernoff and drawn to illustrate an American Scientist article by Robert Levine in 1990.) The dual role of the language caused me a moment of disorientation just now when I went looking for my records of this project. I found an EPS (encapsulated PostScript) file, which I knew was the finished illustration, but where was the source code? And then I remembered: It’s the same file! Open it up in Ghostscript or Adobe Illustrator and you see those silly faces smiling or scowling at you; open the same file in a text editor, and you see procedures for drawing elements of the faces:

   /draweyes
     { newpath
       dx dy eyewidth eyeheight 0 360 ellipse stroke
       ex ey eyewidth eyeheight 0 360 ellipse stroke
     } bind def
   /drawpupils
     { fx fy pupilsize pupilsize 0 360 ellipse fill
       gx gy pupilsize pupilsize 0 360 ellipse fill
     } bind def

Bill Casselman, the graphics editor of the Notices of the American Mathematical Society, still favors this direct-to-PostScript methodology. He has written an excellent guidebook, taking you from the basics of PostScript through an elaborate library for rendering three-dimensional objects.

But here I part company from Casselman; I’d rather not do all my computing in PostScript. It’s not that I have anything against the language itself, but the development environment is not to my taste. I therefore adopted the modus operandi of writing a program in my language of choice (usually some flavor of Lisp) and having that program write a PostScript program as its output. After doing this on an ad hoc basis a few times, it became clear that I should abstract out all the graphics-generating routines into a separate module. The result was a program I named lips (for Lisp-to-PostScript).

Most of what lips does is trivial syntactic translation, converting the parenthesized prefix notation of Lisp to the bracketless postfix of PostScript. Thus when I write (lineto x y) in Lisp, it comes out x y lineto in PostScript. The lips routines also take care of chores such as opening and closing files and writing the header and trailer lines required of a well-formed PostScript program.

But the lips interface is low-level, confined to drawing individual dots, line segments, rectangles and the like. Assembling a complete graph out of these primitives is tedious. For example, the grid of white lines in Figure 1 would have to be drawn one line at a time, with each line specified by a sequence of commands such as

    (newpath)
    (moveto u v)
    (lineto x y)
    (stroke)

Before you can issue those commands, you have to calculate u, v, x and y. Clearly, a higher-level front end is needed; like everyone else, I call mine plot.

At the core of any plotting program is a simple operation: mapping points from an abstract user space to coordinates in a rectangular pane, the page space. In Figure 1, the y axis runs from 0 to 5000; values in this range have to be scaled to the dimensions of the graph, which is about 300 PostScript points, or 11 centimeters. Mathematically, the transformation is straightforward. Indeed, if I wished I could leave all the arithmetic to the PostScript interpreter, simply passing in the appropriate matrix elements for scaling and translation. This is an attractive option; it would allow plot to work entirely in user space. But a few niggling details get in the way. Consider the tick marks along the y axis in Figure 1. Their vertical positions are conveniently expressed in user coordinates: one tick every 500 units. But what about the length of the ticks–their horizontal extent? This dimension is purely concerned with the appearance of the graph and has nothing to do with the content; it ought to be expressed in unscaled units of points or pixels.

Here’s a possible solution: Let everything inside the rectangular frame of the graph–the area with the putty-colored background in Figure 1–go through the scaling engine, but define everything outside the frame, including the tick marks and the axis labels, directly in page coordinates. If you think this is the final answer, take a look at Figure 2:

figure2.png

In this nonsensical graph (constructed just for this occasion), data points are indicated by stars, crosses and diamonds. The positions of those glyphs ought to be defined in user space, but the drawing commands that create the shapes are properly defined in page coordinates. If we tried to draw the glyphs in user space, their size and shape would vary with position in the graph.

What’s the best way to deal with this messy situation? Is there some tidy solution that will reconcile the two coordinate systems and allow all dimensions to be treated uniformly? I don’t believe so; it’s just in the nature of graphs to mix up elements from these two disparate realms. We look through a window into a world of data or mathematical abstractions, but we also draw our own little doodles on the window itself.

Of course there are solutions; they’re just not as pretty as I would like. My own strategy for coping is to attach extra information to each geometric point, indicating whether or not the x and y coordinates are to go through the scaling transformation. This is less troublesome than it might seem; from the user’s point of view, it’s almost always invisible.

In writing the lips and plot programs, I walk a path that is already worn smooth by many earlier footsteps. I don’t know who wrote the first computer program for plotting data, but it probably came soon after the first program for producing data. Today we have hundreds of clever, comprehensive, well-designed and well-maintained programs for plotting and graphing. Gnuplot is very capable; Grace is one I’ve never used but I’ve heard good things about it; Mathematica, Sage, R, MATLAB, Octave and the like all have elaborate graphics facilities built in; the Python world, as usual, has an overabundance of options; there are a few libraries for my beloved Lisp; you can even do dataviz online.

All of which raises the question of why I bother to roll my own. I’ll never keep up–or even catch up–with the efforts of major software companies or the huge community of open-source developers. In my own program, if I want something new–treemaps? vector fields? the third dimension?–nobody is going to code it for me. And, conversely, anything useful I might come up with will never benefit anyone but me.

The trouble is, every time I try working with an external graphics package, I run into a terrible impedance mismatch that gives me a headache. Getting what I want out of other people’s code turns out to be more work than writing my own. No doubt this reveals a character flaw: Does not play well with others.

In any case, the time for change is coming. My way of working is woefully out of date and out of fashion. PostScript is a technology that even Adobe seems to regard as outmoded. And making ultraprecise PostScript graphs is quite silly when their destination is the web; before I can put them online, I have to convert them to low-res PNG images. Furthermore, a PostScript-based workflow loses out on all the interactive richness of the web. These are deathly still images. How can I expect to earn any web cred when my work is not even clickable, much less multitouch-enabled?

If I continue in my stubborn, do-it-yourself mode, I could replace the PostScript back end with one that generates SVG. This wouldn’t be a major undertaking. But is SVG the right answer? It’s been around for more than a decade and you still don’t see much of it in the wild. And there are horrid browser incompatibilities. I suspect that Javascript (and JQuery) has a brighter future. And if I can get over my abreaction to libraries, there are plenty of options. Advice anyone?

by brian at March 09, 2010 03:44 PM UTC

The last blog entry had lots of good comments about different HW policies. I enumerate them and say PROS and CONS
  1. Hard Deadline. PRO- uniform, no favoritism, can post HW Solutions or go over HW in class as soon as it is handed in. CON- there could be legitimate reasons for lateness that are short of a doctors note. CON- you want the student to DO the HW even if it will be late. CON- you need to be TOUGH to say NO.
  2. Moral Deadline (what I do, see last post). Same as Hard Deadline, but its a bit easier to say NO.
  3. Penalty for lateness. PRO- the students will still do the HW. CON- delay in posting solution. CON- slackers are still slackers. ODDITY- the penalty is supposed to discourage lateness. But it may encourage it (gee, 10% off if I hand it in one day late. OKAY, its a deal)
  4. Look at late HW only if they affect the final grade. PRO- less to look at likely, CON- Don't really want to keep track of these things. CON- student may not be discouraged from handing things in late.
  5. Only count (say) 10 of the 12 HWs, and have HARD DEADLINES. PRO- same as HARD DEADLINE. CON- students will blow off 2 HW's, possibly the last two which may be important for the final. CAVEAT- raises the much bigger question of whether to treat students like adults or like ...students.
  6. Students get x number of late days (this one was new to me). PRO- well defined rule, flexible but no favoritism. CON- delay in posting solutions. CON- keeping track of it.
  7. If you miss a HW then the others will count more (up to some limit). PRO- uniform. CON- students may still miss some HW they should do.
  8. HW are OPTIONAL. PRO- they sink or swim on their own. CON- they sink or swim on their own.
Diff topic- how much to COUNT HW? I often count it low (like 10-20 percent) so that I don't' have to worry too much about cheating. Actually I think its GOOD if students help each other but BAD if students copy each other, but it can be hard to tell.

Which of these work best? Depends alot on the school and the course and even the profs willingness to say NO.

by GASARCH (noreply@blogger.com) at March 09, 2010 03:38 PM UTC

The list of accepted papers for EC is up.  I'm happy to say that our Swoopo paper made the list.

I was surprised in the acceptance letter to find that there were 45 acceptances out of 136 papers -- an acceptance rate of about 1/3.  (Compare with WSDM.)  This makes it one of the less "competitive" CS conferences I know of, although a little research shows this is a bit unusual -- last year the numbers were 40/160 or so, so they accepted more papers and had fewer submissions this year.  Is that a trend in the making or an accident of timing this year?  Also, while I'm an EC newbie, the list of papers looks very interesting, with plenty of top-tier names.  "Competitive" or not, I'm expecting high quality.  I'm really looking forward to it -- and not just because its location makes it remarkably convenient for those of us in the greater Boston area.      

by Michael Mitzenmacher (noreply@blogger.com) at March 09, 2010 02:16 PM UTC


Monadic second order logic and treewidth

Ron Fagin is a great theorist who has made many important contributions to diverse aspects of computing. He is perhaps most famous for his brilliant categorization of polynomial time by second order logic—Fagin’s Theorem.

Today I plan on talking about a related theorem—Courcelle’s—proved in 1990. Yet the theorem is not, in my opinion, as well known as it should be. The truth is I did not know this pretty theorem until I ran across it the other day. So perhaps everyone else knows Courcelle’s result except for me. Oh well.

Courcelle’s Theorem

Bruno Courcelle’s theorem is quite striking, in my opinion:

Theorem: Let {\phi} be any graph property that is definable in MSO logic. For any fixed {k}, the there is a linear time algorithm for testing the property {\phi} on any graph of treewidth at most {k}.

I will explain the theorem and try to give a high level view of its proof. But, even if the concepts of MSO—Monadic Second Order—logic and treewidth are new to you, the theorem should still be striking. There are a few theorems of the form: any graph property expressible in a powerful logic is computable on a class of graphs in linear time.

Treewidth

The concept of treewidth is fairly widely known, and was used extensively in the famous work of Neil Robertson and Paul Seymour on graph minors. I will include a definition here for completeness. My intuition is that for most graph problems the case of trees is easy: for example, there is a linear time algorithm for tree isomorphism. The notion of small tree width is an attempt—a very successful attempt—to generalize trees to include a much larger class of graphs.

A tree decomposition of a graph {G = (V,E)} is a composed of a family of sets of vertices, called bags, {B_{1},\dots,B_{m}} and a tree {T} on the nodes {1,\dots,m} with these properties:

  • Every vertex of {G} is in at least one bag.
  • Every edge of {G} has both endpoints in some bag.
  • For all vertices of {G}, the set of bags that contain the vertex form a subtree of {T}.

The last is more precisely: for all vertices {v} of {G} the set {\{i \mid v \in B_{i} \}} is a subtree of {T}. The width of the decomposition is the maximum of {|B_{i}|-1} over all {i}. The treewidth of a graph {G}, {\mathbf{tw}(G)} is the minimum such width over all valid tree decompositions. The {-1} is there to make the treewidth of a tree {1}.

Here is an example, from our friends at Wikipedia, of a graph and a decomposition:

MSO

Suppose that {E(x,y)} stands for the edge relationship of some simple undirected graph. Then, first order sentences allow variables only over vertices. For example, the sentence

\displaystyle  \forall x,y,z \ E(x,y) \wedge E(y,z) \rightarrow \neg E(y,x)

means that the graph has no triangle. This is a first order sentence, since all the variables range over vertices of the graph.

First order sentences can capture some interesting properties—they can express the property of having a {k}-clique for any fixed {k}. However, first order is too weak to expressive many important graph properties such as the property: a graph is {k}-colorable.

This leads to the idea to extend the first order logic to allow more powerful variables. A natural notion is to allow variables to range over arbitrary sets of tuples of vertices: this is called Second Order (SO) logic. This theory is very powerful, since Fagin’s Theorem is:

Theorem: Properties expressible by second-order logic existential sentences are precisely the complexity class NP.

An existential sentence is a sentence of the form:

\displaystyle  \exists R \exists S \dots \phi(R,S)

where {R} and {S} and {\dots} range over relations and {\phi} is a first order formula.

Clearly, there are properties of this form requiring more than linear time—even on trees. This is the reason that Courcelle’s Theorem must restrict the properties to Monadic Second Order (MSO) logic. A sentence is a monadic one provided all the second order variables range over sets. Notice that being expressible in MSO logic is a stronger requirement than being expressible in Second Order logic: MSO is a subset of Second Order logic.

It is now easy to define {k}-colorable. Here is how to express {3}-colorable:

\displaystyle  \exists A \exists B \exists C \ \phi(A,B,C)

where {\phi(A,B,C)} uses first order quantifiers to check {A,B,C} form a partition of the vertices, and that if {E(x,y)} then {x} and {y} are colored differently. Think of {A(x)} as meaning {x} is colored “A,” and the same for the other sets.

The logic MSO is quite powerful, but is still not powerful enough to define some properties: for example, the property of having a Hamiltonian Circuit, is not definable in MSO.

Proof Idea

The proof of Courcelle’s theorem is based on two key ideas:

  1. The ability to find a {k}-treewidth decomposition in linear time.
  2. The ability to check the MSO property on a tree decomposition in linear time.

The first is a result due to Hans Bodlaender, who improved the original algorithm of Robertson and Seymour from quadratic time; see this for a nice survey and discussion of algorithms for finding treewidth.

Once a tree decomposition is found the key is a tree automaton can be constructed to check the MSO sentence. Tree automata are a generalization of finite automata from linear words to finite trees. Luckily many of the same constructions and properties of finite automata carry over to tree automata, this allows the checking to be done in linear time.

Open Problems

The Courcelle theorem is, in my opinion, quite pretty. However, the hidden constants are huge, and an obvious question is when can “real” linear time algorithms be found for properties expressible in MSO? Another interesting open problem is what is the largest class of sentences of SO that lead to linear time algorithms on graphs of bounded treewidth?


by rjlipton at March 09, 2010 12:24 AM UTC

Writing a paper takes a tremendous amount of time. So, one of the frequent complaints that authors  make is when PC members submit half-baked, clearly below-threshold reviews on a paper just to get the resume bullet and claim to have done their reviewing duties. Personally, I feel intense anger when receiving  crappy reviews that come with not the slightest bit of insight, and then am expected to rebut them or accept them. Not to mention the long-term psychological damage incurred by having papers rejected one after another. 

The problem is that reviewing a paper for a conference is free: all it takes is a few clicks of the mouse to upload your PDF file. (Of course, I'm not accounting for the cost of doing the research  (ha!) and actually reviewing the paper.)

Let's estimate the costs associated with doing research and submitting papers to conferences. I spend many months working, writing and submitting papers to conferences. A highly competitive conference will assign three reviewers to my paper, and with a lot of luck one of them might even tangentially be aware of my research area. After I make up a bunch of numbers, the cost of rejection of my paper amounts to over 3 gazillion dollars, none of which I can recoup. It's clear that conferences, which only survive if people submit, should be paying me to submit !

Of course, imposing this kind of a fee would no doubt drastically reduce the number of papers that are submitted. But this seems like a good thing: it would probably reduce the number of conferences, and remove the fiction that conferences actually do "quality control", leaving them with their original purpose of networking and creating a sense of community. Conferences could generate revenue by charging reviewers for the opportunity to preview the new works being submitted: this  would potentially also improve the quality of the reviews as well.  Although the financial incentive is not that great, getting paid should encourage TPC members to take the process more seriously.

The only downside I can see is people who submit a ton of papers everywhere and become "professional paper writers", but TPC chairs would clearly have to balance this against the research credentials of the people submitting papers. Note that many journals impose author fees for publication of the paper, so this provides a nice offset against that cost. 

It just seems crazy to me that the research community provides this free paper previewing service for committees with no negative ramifications for writing totally bogus reviews.

Disclaimers for the sarcasm-challenged:
  • Yes, I am obviously aware of Matt Welsh's post on this topic
  • Yes, this is a total ripoff/parody of his post
  • Yes in fact, I disagree with his point.


by Suresh (noreply@blogger.com) at March 08, 2010 10:34 PM UTC

In Google Wave, when a user is editing a wave, every keystroke entails sending messages back and forth between that user, the Wave servers, and every other person who has the same wave open at the same time, so that all users can see what everybody else is typing as they type it. But that's not so bad: it's just a small message, and everything is still pretty fast, at least on a new wave with not much in it yet. Despite all this communication the system can be very usable and responsive.

However, when a wave grows to a hundred or so messages and a few tens of thousands of characters, everything slows down to a crawl. Each keystroke takes a second or so to process. Typing directly into a wave can become so painful that it can be much easier to switch back and forth between a separate text editor and Wave and to copy and paste large chunks of text (each a single operation, as fast as a single keystroke) rather than editing directly within the wave. The only way to return to a responsive system seems to be to archive the wave and start a fresh one.

So far this is verifiable personal experience but the rest is speculation. My wild guess at the underlying cause for this lack of scalability: as each keystroke is processed, it entails an amount of work by the browser, or worse an amount of communication between the browser and the Wave servers, that's linear in the size of the wave. For instance, it's plausible to me (though I don't know the details of the communication protocol) that each update causes the server to communicate the entire new updated state of the wave to the browser. The net effect is that, if one creates a wave by typing a character at a time, the number of bytes communicated, over the course of the wave, is quadratic in the number of characters in the wave. And as we all (should) know, linear algorithms scale well to large amounts of data and quadratic algorithms don't.

The moral: asymptotic analysis matters for usability, even for problem sizes that are not massive (there is no problem fitting a whole wave into main memory) and even in situations such as this one where the designers are probably thinking less about algorithmic efficiency and more about functionality and user interface design. If your system doesn't scale, it will become slow and unusable when its users push its boundaries, and that will limit the uses people make of it.

at March 08, 2010 09:06 PM UTC

As reported earlier, we had a very accelerated hiring season at UMD — all our candidates were brought in for interviews before the end of last semester(!)

I am pleased to announce that three faculty will be joining our department, including one “core theorist” and two other “theory-friendly” profs. These are:

I look forward to seeing them all here!


by jonkatz at March 08, 2010 04:33 PM UTC

This semester I am using the following HW policy.
HW is due on Tuesday. However, your dog died! Hence you get an extension to Thursday. That is, for all people in the class I assume you have a quasi-legit reason to ask for an extension to Thursday. Hence you can hand it in Thursday for full credit. However, if you want an extension past that you will not get it since I already gave you an extension to Thursday. (There may be some severe exceptions which will have to be documented.)
  1. This will save alot of time in terms of students asking permission to hand it in late since I will say I already have you an extension and you are asking for another one?
  2. Some students will get into the habit of handing it in Thursday. This is okay so long as they do not ask for an extension past that.
  3. Clyde tells me that this is really a cheat- the HW really is due Thursday. I may have a higher moral ground when telling them they can't hand it in later than Thursday, but they will still feel that they deserve an extension if their dog dies on Wednesday. My response: they do not.
  4. I do make sure that they have enough knowledge to do the HW by Tuesday.
  5. I am teaching one Junior-Senior class and one honors-class so these are already pretty good students. They (I hope) know what I mean when I say that they cannot ask for an extension past Thursday. Also they will likely not need them. I have not tried this in a Freshman class. I would like to but they are usually co-taught and large so it would be harder to manage.

by GASARCH (noreply@blogger.com) at March 08, 2010 03:28 PM UTC

We present new explicit constructions of *deterministic* randomness extractors, dispersers and related objects. We say that a distribution $X$ on binary strings of length $n$ is a $\delta$-source if $X$ assigns probability at most $2^{-\delta n}$ to any string of length $n$. For every $\delta>0$ we construct the following poly($n$)-time computable functions: 1) (2-source disperser:) $D:(\{0,1\}^n)^2 \rightarrow \{0,1\}$ such that for any two independent $\delta$-sources $X_1,X_2$ we have that the support of $D(X_1,X_2)$ is $\{0,1\}$. 2) (Bipartite Ramsey graph:) Let $N=2^n$. A corollary is that the function $D$ is a 2-coloring of the edges of $K_{N,N}$ (the complete bipartite graph over two sets of $N$ vertices) such that any induced subgraph of size $N^{\delta}$ by $N^{\delta}$ is not monochromatic. (3) (3-source extractor:) $E:(\{0,1\}^n)^3 \rightarrow \{0,1\}$ such that for any three independent $\delta$-sources $X_1,X_2,X_3$ we have that $E(X_1,X_2,X_3)$ is $o(1)$-close to being an unbiased random bit. No previous explicit construction was known for either of these for any $\delta<1/2$, and these results constitute significant progress to long-standing open problems. A component in these results is a new construction of condensers that may be of independent interest: This is a function $C:\{0,1\}^n \rightarrow (\{0,1\}^{n/c})^d$ (where $c$ and $d$ are constants that depend only on $\delta$) such that for every $\delta$-source $X$ one of the output blocks of $C(X)$ is (exponentially close to) a $0.9$-source. (This result was obtained independently by Ran Raz). The constructions are quite involved and use as building blocks other new and known objects. A recurring theme in these constructions is that objects which were designed to work with independent inputs, sometimes perform well enough with correlated, high entropy inputs. Preliminary version of this paper appeared in STOC 2005.

at March 08, 2010 02:03 PM UTC

We say that a polynomial $f(x_1,\ldots,x_n)$ is {\em indecomposable} if it cannot be written as a product of two polynomials that are defined over disjoint sets of variables. The {\em polynomial decomposition} problem is defined to be the task of finding the indecomposable factors of a given polynomial. Note that for multilinear polynomials, factorization is the same as decomposition, as any two different factors are variable disjoint. In this paper we show that the problem of derandomizing polynomial identity testing is essentially equivalent to the problem of derandomizing algorithms for polynomial decomposition. More accurately, we show that for any reasonable circuit class there is a deterministic polynomial time (black-box) algorithm for polynomial identity testing of that class if and only if there is a deterministic polynomial time (black-box) algorithm for factoring a polynomial, computed in the class, to its indecomposable components. An immediate corollary is that polynomial identity testing and polynomial factorization are equivalent (up to a polynomial overhead) for multilinear polynomials. In addition, we observe that derandomizing the polynomial decomposition problem is equivalent, in the sense of Kabanets and Impagliazzo, to proving arithmetic circuit lower bounds to NEXP. Our approach uses ideas from a previous work in which we showed that the polynomial identity testing problem for a circuit class $\mathcal M$ is essentially equivalent to the problem of deciding whether a circuit from $\mathcal M$ computes a polynomial that has a read-once arithmetic formula.

at March 08, 2010 12:05 PM UTC

The 3rd International Symposium on Algorithmic Game Theory, SAGT2010, will be held on October 18-20, 2010 in Athens, GREECE.  Submission deadline is May 3rd.


by noamnisan at March 08, 2010 09:49 AM UTC

(this is part of an occasional series of essays on clustering: for all posts in this topic, click here)

It's time to take a brief break from the different clustering paradigms, and ponder probably THE most vexing question in all of clustering.
How do we choose k, the number of clusters ?
This topic is so annoying, that I'm going to devote more than one post to it. While choosing k has been the excuse for some of the most violent hacks in clustering, there are at least a few principled directions, and there's a lot of room for further development.

(ed. note. The Wikipedia page on this topic was written by John Meier as part of his assignment in my clustering seminar. I think he did a great job writing the page, and it was a good example of trying to contribute to the larger Wikipedia effort via classroom work)

With the exception of correlation clustering, all clustering formulations have an underconstrained optimization structure where the goal is to trade off quality for compactness of representation. Since it's always possible to go to one extreme, you always need a kind of "`regularizer"' to make a particular point on the tradeoff curve the most desirable one. The choice of $k$, the number of clusters, is one such regularizer - it fixes the complexity of the representation, and then asks you to optimize for quality.

Now one has to be careful to see whether 'choosing k' even makes sense. Case in point: mixture-model clustering. Rather than asking for a grouping of data, it asks for a classification. The distinction is this: in a classification, you usually assume that you know what your classes are ! Either they are positive and negative examples, or one of a set of groups describing intrinsic structures in the data, and so on. So it generally makes less sense to want to "`choose"' $k$ - $k$ usually arises from the nature of the domain and data.


But in general clustering, the choice of $k$ is often in the eyes of the beholder. After all, if you have three groups of objects, each of which can be further divided into three groups, is $k$ 3 or 9 ? Your answer usually depends on implicit assumptions about what it means for a clustering to be "`reasonable"' and I'll try to bring out these assumptions while reviewing different ways of determining $k$.

The Elbow Method

The oldest method for determining the true number of clusters in a data set is inelegantly called the elbow method. It's pure simplicity, and for that reason alone has probably been reinvented many times over (ed. note: This is a problem peculiar to clustering; since there are many intuitively plausible ways to cluster data, it's easy to reinvent techniques, and in fact one might argue that there are very few techniques in clustering that are complex enough to be 'owned' by any inventor). The idea is this:

Start with $k=1$, and keep increasing it, measuring the cost of the optimal quality solution. If at some point the cost of the solution drops dramatically, that's the true $k$.

The intuitive argument behind the elbow method is this: you're trying to shoehorn $k$ boxes of data into many fewer groups, so by the pigeonhole principle, at least one group will contain data from two different boxes, and the cost of this group will skyrocket. When you finally find the right number of groups, every box fits perfectly, and the cost drops.

Deceptively simple, no ? It has that aspect that I've mentioned earlier - it defines the desired outcome as a transition, rather than a state. In practice of course, "`optimal quality"' becomes "`whichever clustering algorithm you like to run"', and "`drops dramatically"' becomes one of those gigantic hacks that make Principle and Rigor run away crying and hide under their bed.


The Alphabet-Soup Criteria

So can we make the elbow method a little more rigorous ? There have been a few attempts that work by changing the quantity that we look for the elbow in. A series of "`information criteria"'  (AIC, BIC, DIC, and so on) attempt to measure some kind of shift in information that happens as we increase $k$, rather than merely looking at the cost of the solution.

While they are all slightly different, they basically work the same way. They create a generative model with some kind of term that measures the complexity of the model, and another term that captures the likelihood of the given clustering. Combining these in a single measure yields a function that can be optimized as $k$ changes. This is not unlike the 'facility-location' formulation of the clustering problem, where each "`facility"' (or cluster) must be paid for with an 'opening cost'. The main advantage of the information criteria though is that the quantification is based on a principled cost function (log likelihood) and the two terms quantifying the complexity and the quality of the clustering have the same units.

Coming up next: Diminishing returns, the ROC curve, and phase transitions.


by Suresh (noreply@blogger.com) at March 08, 2010 06:03 AM UTC

The "final version" for our NSDI paper, Carousel: Scalable Logging for Intrusion Prevention Systems, is now up.

Here's the main idea.  In IPS systems, the logger can get overwhelmed during an attack.  Bad sources (with other related info) need to be recorded for later examination, but there's only a small amount of on chip memory to buffer bad sources, and only a small about of bandwidth (compared to the rate data is coming into the system) from the memory to the more robust recording infrastructure.  How do you get all, or almost all, of the sources?

In our model, bad sources are hitting the system repeatedly -- the denial of service attack setting.  On the plus side, this means you don't have to log a bad source the first time it appears - the assumption is it will come back again.  On the negative side, you want to avoid sending the same source to the recorder multiple times, as it wastes your small bandwidth.  (Dups can be removed at the recording side, though.)

The baseline solution is to just grab a bad source to record for the buffer as soon as you have room after you send one out.  We consider a random model -- there's a bunch of bad sources, and the next one to appear is random from that set -- that shows that this approach is bad;  by connecting it to the coupon collector's problem, we show it can be a logarithmic factor off of optimal.  Other experiments show it can be even worse than this in realistic situations. 

Our solution has two parts.  We hash-and-partition the bad sources, adaptively finding a partition size that so that the bad sources in each partition fit into our small memory.  That is, we hash each source, and put in a partition according to the last k bits.  This breaks our N bad sources into groups of size (roughly) N/2^k, and if N/2^k is small enough so that the partition fits into memory, then (assuming we can avoid duplicates), we no longer have a memory problem.  We just run through all the partitions, giving us our "Carousel".   

To deal with duplicates, we do the obvious -- we use a Bloom filter (or equivalent structure) to avoid sending duplicates within each partition.

This hash-and-partition plus Bloom filter type framework seems like a potential generally useful trick;  more details -- including both theoretical analysis and experiments -- are in the paper.

An interesting thing came up in the reviews.  We pointed out that the "straw man" approach -- do nothing -- was quite bad.  We mentioned that just using a Bloom filter -- without partitioning -- wouldn't really help, but then ignored that option.  Apparently, this was a big concern for the reviewers;  my understanding is that it almost "sunk" the paper.  They wanted to see more details, including simulations, on this option.  Luckily, it was still accepted, and in the final version, in response to the reviews, we've added some theory and simulations to prove our point.  (The point is you'd need a really, really big Bloom filter -- too big for your memory -- to track all the sources, so eventually you have to clear the Bloom filter, which causes you to lose whatever it was going to gain you in the first place.)  I'm not sure what the takeaway is there.  The right outcome happened -- they should have accepted the paper, and we could add the appropriate stuff.  But I'd have hated for what in my mind is a minor issue to have killed the paper.  Perhaps we should have said more about it, although at some point space prevents you from providing details on every possible variation you could consider (even if you have actually considered them).    

This seems like a good place to remind people about this book -- Algorithms for Next Generation Networks -- which includes a survey about these sorts of hashing applications in networks by Adam Kirsch, George Varghese, and me.  The book does seem a little pricey;  we have a slightly older version of the survey available online.   

   


by Michael Mitzenmacher (noreply@blogger.com) at March 08, 2010 03:48 AM UTC

In a tutorial on average-case complexity that I gave at FOCS 2008, I mentioned four of my favorite open questions in the theory. I have some progress to report on one of the questions (#3 in this post), and on two related problems.

The question is whether a certain exponential loss is necessary in Levin’s proof that there are problems that are “NP-complete on average.” The answer is yes, it is necessary, unless the polynomial hierarchy collapses. The two related problems are whether related exponential losses in the analysis of Levin’s universal search algorithm and of Levin’s universal one-way function are also necessary. The answers are, respectively, yes unless {P=NP}, and yes relative to an oracle.

This is one of those now fashionable papers whose arguments are “obvious in retrospect.” (I would like to claim, however, that the proof of the result for Levin’s average-case completeness, while obvious in hindsight, is not so obvious without hindsight.) In all three cases, the exponential loss is in terms of the code length of certain programs. The idea in each case is to construct programs that contain in their code information about a specific instance of a problem of interest.

1. Levin’s Universal Search

It is simpler to start the discussion from Levin’s universal search algorithm. Levin, in his 1973 paper independently discovering NP-completeness, suggested the following procedure to solve NP search problems: look for solutions in order of increasing resource-bounded Kolmogorov complexity. In other words, try all possible algorithms, with a time-out, for all possible time-outs, with a rather smart schedule to choose algorithms and time-outs.

In the concrete case of 3SAT, Levin’s algorithm has the property that for every algorithm {A} there exists a constant {c_A} such that if {A} finds a satisfying assignment for a formula {\phi} in time {t_A(\phi)}, then Levin’s algorithm finds a satisfying assignment for {\phi} in time at most

\displaystyle   c_A \cdot t_A(\phi) + |\phi|^{O(1)} \ \ \ \ \ (1)

That is, for every algorithm {A}, the running time of Levin’s algorithm is at most a constant factor slower than {A} on every input. The value of {c_A}, however, is exponential in the length of the code of {A}.

Could we hope to have a universal search algorithm for 3SAT that, for every algorithm {A} and satisfiable {\phi} as above has running time upper bounded by a polynomial in the length of {A}, in {t_A(\phi)}, and in {|\phi|}?

There is a very easy argument (which, as far as I know, has not been observed before) showing that such a “fully polynomial” universal search algorithm would imply P=NP.

Suppose such an algorithm existed, call it {FPS}; I claim that, for every satisfiable formula {\phi}, {FPS} outputs a satisfying assignment for {\phi} in time {|\phi|^{O(1)}}. ({FPS} may not terminate on unsatisfiable formulas, but the fact that it finds satisfying assignments, when they exist, in polynomial time suffices to conclude that {P=NP}.) To prove the claim, let {\phi} be a satisfiable formula, and consider the algorithm {A_\phi} that, on input a formula {\psi}, does the following: if {\psi=\phi} then {A_\phi (\psi)} outputs a satisfying assignment for {\phi} that is explicitly specified as part of the code of {A_\phi}; if {\psi\neq\phi} then {A_\phi(\psi)} performs a brute-force search for a satisfying assignment for {\psi}, and it outputs the first one it finds, if any.

The length of the code of {A_\phi} is linear in {|\phi|}, and the running time of {A_\phi(\phi)} is also linear in {|\phi|}, and so {FPS(\phi)} must output a satisfying assignment for {\phi} in polynomial time.

Indeed, even this completely trivial proof is an overkill: we could have let {A_\phi(\psi)} just output a satisfying assignment of {\phi} regardless of the input {\psi}. The definition of {A_\phi(\cdot)} given above, however, rules out even universal search algorithm whose performance is guaranteed only against algorithms that are provably correct on all inputs. Hutter has studied such algorithms, and proved that the exponential dependency on the length of the code of {A} (and, in his case, exponential dependency on the length of the proof of correctness of {A} and the proof establishing time bounds on {A}) can be pushed to an additive term, rather than being a factor that multiplies the running time of {A}, as in (1). The above construction, however, shows that the additive term cannot be polynomial in the code length and proof length.

There are two other famous results of Levin which involve exponential complexity in the length of the code of certain algorithms, and, in those cases, the necessity of the exponential dependency is a somewhat less trivial question.

2. Levin’s Universal One-Way Function

Quite related to his universal search algorithm is Levin’s proof that there exists a universal one-way function, that is, a polynomial time computable function that is one-way provided that one-way functions exist.

Levin’s idea is the following. Define the function {u(\cdot,\cdot)} such that for every algorithm {A} and input {x} we have

\displaystyle  u(A,x) = (A,y)

where {y} is the output of {A(x)} if the computation of {A(x)} terminates within, say, {|x|^2} steps, and {y} is, say, a string of {|x|} zeroes otherwise. We call such a function {u} because it is essentially a (time-bounded) universal turing machine.

Now it is easy to see that if one-way functions exist, then there exists a one-way function computable in time at most {|x|^2} on input {x}. Call this function {f} and let {A_f} be the algorithm that computes {f(x)} in time {< |x|^2}. Then the mapping

\displaystyle  x \rightarrow u(A_f,x )

is one-way (in fact, it is essentially the mapping {x \rightarrow f(x)}), and it accounts for a {1/2^{|M_f|}} fraction of the inputs of {u}. This implies that {u} is a weak one-way function.

Finally, define {f_{Levin}} to be the function

\displaystyle  f_{Levin} (z_1,\ldots,z_m) := (u(z_1),\ldots,u(z_m))

where an input of length {n} has been parsed into {m = \lfloor \sqrt n \rfloor} strings of length {m}. By Yao’s amplification of hardness for one-way functions, if {u} is a weak one-way function then {f_{Levin}} is a one-way function in the standard sense.

Where is the exponential dependency here?

This is a subtle but important point that I learnt from Charles Rackoff. Suppose that we have a polynomial time algorithm {Inv_{Levin}} that inverts {f_{Levin}} on an inverse polynomial fraction of inputs. Let {g} be any polynomial time computable function. Then by using {Inv_{Levin}} and the reduction outlined before we can invert {g} on, say, half of the inputs, in time that is polynomial in the length of the input, but exponential in the length of the program of {g}.

Now, the strength of a universal one-way function is the fact that one can use it as a primitive and claim that either it is secure or otherwise all one-way functions can be inverted and thus no cryptography is possible. But if we assume that Levin’s function is not one-way, the inverter that we get for, say, AES, has complexity much higher than brute-force search. In general, if we have a finite function whose code length is comparable to its input length (as is the case of functions constructed using {S}-boxes and other ad-hoc tables of values) then no non-trivial attack for such a function would follow from breaking Levin’s function.

It would be much more desirable to construct a universal one-way function {f_{U}} with the property that if there is a polynomial time algorithm that inverts {f_U} on an inverse polynomial time fraction of inputs, then we can use such an inverter to invert any other function {g} on an inverse polynomial fraction of inputs in time polynomial in the length of the input and on the length of the code of {g}.

We show that there is an oracle relative to which one-way functions exist but there is no algorithm that is able to invert every {g} on an inverse polynomial time fraction of inputs and in time polynomial in the code length of {g} and in the length of the given inputs.

3. Levin’s Average-Case Complete Problem

The exponential dependency in Levin’s average-case complexity is even harder to explain.

Perhaps it helps to start from Cook’s theorem that 3SAT is NP-complete, in the standard worst-case complexity, setting. Cook proves that for every language {L} in {NP} there is a polynomial time computable reduction function {f_L} such that for every string {x} we have

\displaystyle  x \in L \Leftrightarrow f_L(x) \in 3SAT \ \ \ \ \ (2)

So if we have a worst-case polynomial time algorithm for solving 3SAT, we also get a worst-case polynomial time algorithm for {L} which works as follows: on input {x}, apply the 3SAT algorithm for {f_L(x)}.

An important point is that such reductions {f_L} not only exist, but are also explicitly and efficiently given if one has a description of {L} in the form of a non-deterministic polynomial time algorithm that decides {L}. Indeed, Cook’s theorem shows the existence of a single function {f(\cdot,\cdot,\cdot)} such that if {M} is a non-deterministic algorithm that decides a language {L} in time {p(n)}, then the mapping

\displaystyle  x \rightarrow f(M,x,p(|x|))

is a reduction {f_L} from {L} to {3SAT}. Furthermore, {f(M,x,t)} is computable in time polynomial in {|M|}, in {|x|} and in {t}.

Note that, in order to establish the {NP}-completeness of 3SAT it would have been perfectly adequate if {f(M,x,t)} had been computable in time exponential or doubly exponential in {|M|}, as long as it is polynomial in {|x|} and {t}. This hypothetical result would, however, be much less satisfying, because in practice it would not be true that “an efficient algorithm for 3SAT implies an efficient algorithm for each {L} in {NP},” unless we restrict ourselves to languages {L} that have extremely short descriptions.

In the theory of average-case complexity, instead of languages {L} we study distributional problems, that is pairs {(L,D)} where {L} is a language and {D} is a distribution over the inputs. A reduction from {(L_1,D_1)} to {(L_2,D_2)} is a polynomial time computable function {f} such that {x\in L_1} iff {f(x) \in L_2}, as in the worst-case complexity theory, but with the additional property that {f(D_1)} is “polynomially dominated” by {D_2}. This means that if we look at any set of “bad” instances {B} that has probability {\beta} according to {D_2}, then if sample a random {x} according to {D_1}, the probability of {f(x) \in B} is at most {poly(|x|) \cdot \delta}.

The point of this definition is that we would like to say the following: if there is a good average-case algorithm {A_2} for {(L_2,D_2)}, then there also is a good average-case algorithm for {(L_1,D_1)}, which works as follows: on input {x}, run algorithm {A_2} on {f(x)}. The good-on-average algorithm {A_2}, however, might provide incorrect answers (or time out, depending on the flavor of definition) on some inputs, which have small probability mass according to {D_2}. The domination property ensures us that the distribution of {f(x)}, with {x} sampled from {D_1}, is not too likely to land into this set of bad inputs for {A_2}.

Consider now the non-deterministic bounded-halting problem {BH} defined as follows: on input {(M,x,y)}, where {M} is a non-deterministic algorithm and {x} and {y} are bit strings, is there a computational path of {M} that accepts {x} in time at most {|y|}? Let {U} be the uniform distribution (ensemble) over bit strings.

Here is a proof that the distributional problem {(BH,U)} is average-case complete for all distributional problems of the form {(L,U)}, where {L} is in {NP}: to reduce {(L,U)} to {(BH,U)}, use the (randomized) reduction

\displaystyle  f_L(x) := (M_L, x, y )

where {M_L} is a non-deterministic Turing machine that decides {L} in time {p(n)} for some polynomial {p}, and {y} is a randomly chosen string of length {p(|x|)}.

It is clearly true that, for all {x} and all choices of {y}, {x\in L \Leftrightarrow f_L(x) \in BH}, and the domination property holds with a loss in probability which is about {2^{|M_L|}}.

For reasons that would take some time to elucidate, this loss in the domination factor is roughly equivalent to having a factor of {2^{|M_L|}} in the running time of the reduction, and this is a rather undesirable feature of the reduction.

(Compare with the hypothetical discussion above on a proof of Cook’s theorem in which the reduction from a generic {L} to {SAT} takes time exponential in the description of the non-deterministic algorithm for {L}.)

Could we have a loss in probability which is polynomial in {|M_L|} and in {|x|}? (We call a reduction with such parameters a “fully polynomial” reduction.)

Suppose that we had a worst-case to average-case reduction within NP, from, say, 3SAT to a distributional problem {(L,U)}. That is a randomized transformation that starting from a 3SAT instance {\phi} produces a distribution {f(\phi)} that is dominated by the uniform distribution and such that {\phi \in 3SAT} if and only if {f(\phi) \in L}. Then we would also have a “fully polynomial” reduction proving that {(L,U)} is complete on average for {(NP,PSamp)}. To reduce {(L',D) \in (NP,PSamp)} to {(L,U)} we would first reduce from {L'} to {3SAT} using Cook’s theorem, and then we would reduce from worst-case 3SAT to {(L,U)} using the hypothetical worst-case to average-case reduction. We can check that the domination parameter would be polynomial in the length of the code of the non-deterministic algorithm defining {L'}.

The observation that I consider the main point of the paper is that the converse is true. If we had a completeness result in {(NP,U)} via a fully polynomial reduction, then we would also have a worst-case to average-case reduction.

Here is the proof. Suppose a distributional problem {(L,U)} were complete for {(NP,U)} under fully polynomial reductions. We want to show that there is a worst-case to average-case reduction from 3SAT to {(L,U)}. Toward this goal, consider the language {L_\phi} recognized by the non-deterministic algorithm {M_\phi} that, on every input {x}, accepts if and only if {\phi} is satisfiable, thus {L_\phi} is either {\{ 0,1 \}^*} or {\emptyset} depending on whether {\phi} is satisfiable or not. Note that the running time and length of {M_\phi} are polynomial in {|\phi|}. We assumed that {(L,U)} is complete for {(NP,U)} via a fully polynomial reduction {f(\cdot,\cdot,\cdot)}, so the mapping

\displaystyle  x \rightarrow f(M_\phi, x, |\phi|^{O(1)} )

is computable in time polynomial in {|x|} and {|\phi|} and, for a random {x}, the output is dominated by the uniform distribution with a domination parameter polynomial in {|x|} and {|\phi|}. To finish the argument, notice that for every {\phi} and {x}

\displaystyle  \phi \in 3SAT \Leftrightarrow x \in L_\phi \Leftrightarrow f(M_\phi,x,|\phi|^{O(1)} ) \in L

and so the mapping that, given {\phi}, outputs {f(M_\phi, x, |\phi|^{O(1)} ) } for a random {x} is a worst-case to average-case reduction from {3SAT} to {(L,U)}.

In past work with Andrej Bogdanov, we showed that such a worst-case to average-case reduction would imply that the polynomial hierarchy collapses. In fact, this is true even if the worst-case to average-case reduction is a “truth-table reduction.” (A polynomial time reduction that is allowed to make non-adaptive queries to the algorithm for the target problem.)

With the above argument, it follows that a fully polynomial completeness result proved with a truth-table reduction would also imply the collapse of the polynomial hierarchy.

Could there be a fully polynomial completeness result via a “Cook reduction”? (A polynomial time reduction that is allowed to make arbitrary, possibly adaptive, oracle queries to the algorithm for the target problem.) This is exactly the same question as whether there is an adaptive worst-case to average-case reduction within in NP, which remains open.


by luca at March 08, 2010 12:51 AM UTC

Authors: Jaroslaw Byrka, Aravind Srinivasan, Chaitanya Swamy
Download: PDF
Abstract: We give a new randomized LP-rounding 1.725-approximation algorithm for the metric Fault-Tolerant Uncapacitated Facility Location problem. This improves on the previously best known 2.076-approximation algorithm of Swamy & Shmoys. To the best of our knowledge, our work provides the first application of a dependent-rounding technique in the domain of facility location. The analysis of our algorithm benefits from, and extends, methods developed for Uncapacitated Facility Location; it also helps uncover new properties of the dependent-rounding approach. An important concept that we develop is a novel, hierarchical clustering scheme. Typically, LP-rounding approximation algorithms for facility location problems are based on partitioning facilities into disjoint clusters and opening at least one facility in each cluster. We extend this approach and construct a laminar family of clusters, which then guides the rounding procedure. It allows to exploit properties of dependent rounding, and provides a quite tight analysis resulting in the improved approximation ratio.

at March 08, 2010 01:30 AM UTC

Authors: Ulrike von Luxburg, Agnes Radl, Matthias Hein
Download: PDF
Abstract: The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. According to folklore opinion, it has the property that vertices in the same cluster of the graph are close to each other while vertices in different clusters are far from each other. We study the behavior of the commute distance and hitting times on random geometric graphs ($\epsilon$-graphs, $k$-nearest neighbor graphs and Gaussian similarity graphs). It turns out that as the size of the graph increases, the suitably rescaled hitting times and commute distances can be approximated by extremely simple expressions. However, these expressions no longer take into account the cluster structure of the graph, which leads to a completely meaningless distance function. Consequently, the use of the commute distance for machine learning purposes is discouraged for large graphs and in high dimensions. Our paper also makes several important technical contributions such as bounding the spectral gap in random geometric graphs with general support and distribution.

at March 08, 2010 01:30 AM UTC

Authors: Dániel Marx, Ildikó Schlotter
Download: PDF
Abstract: We investigate a special case of the Induced Subgraph Isomorphism problem, where both input graphs are interval graphs. We show the NP-hardness of this problem, and we prove fixed-parameter tractability of the problem with non-standard parameterization, where the parameter is the difference |V(G)|-|V(H)|, with G and H being the larger and the smaller input graph, respectively. Intuitively, we can interpret this problem as "cleaning" the graph G, regarded as a pattern containing extra vertices indicating errors, in order to obtain the graph H representing the original pattern. We also prove W[1]-hardness for the standard parameterization where the parameter is |V(H)|.

at March 08, 2010 01:30 AM UTC

Authors: Viswanath Gunturi, Shashi Shekhar, Arnab Bhattacharya
Download: PDF
Abstract: Given a spatio-temporal network (ST network) whose edge properties vary with time, a time-sub-interval minimum spanning tree (TSMST) is a collection of distinct minimum spanning trees of the ST network. The TSMST computation problem aims to identify a collection of distinct minimum spanning trees and their respective time-sub-intervals under the constraint that the edge weight functions are piecewise linear. This is an important problem in ST network application domains such as wireless sensor networks (e.g., energy efficient routing). Computing TSMST is challenging because the ranking of candidate spanning trees is non-stationary over a given time interval. Existing methods such as dynamic graph algorithms and kinetic data structures assume separable edge weight functions. In contrast, this paper proposes novel algorithms to find TSMST for large ST networks by accounting for both separable and non-separable piecewise linear edge weight functions. The algorithms are based on the ordering of edges in edge-order-intervals and intersection points of edge weight functions.

at March 08, 2010 01:30 AM UTC

Authors: Volkmar Putz, Karl Svozil
Download: PDF
Abstract: We propose to "boost" the speed of communication and computation by immersing the computing environment into a medium whose index of refraction is smaller than one, thereby trespassing the speed-of-light barrier.

at March 08, 2010 01:30 AM UTC

Authors: Deepak Ponvel Chermakani
Download: PDF
Abstract: One of my recent papers transforms an NP-Complete problem into the question of whether or not a feasible real solution exists to some Linear Program. The unique feature of this Linear Program is that though there is no explicit bound on the minimum required number of linear inequalities, which is most probably exponential to the size of the NP-Complete problem, the Linear Program can still be described efficiently. The reason for this efficient description is that coefficients keep repeating in some pattern, even as the number of inequalities is conveniently assumed to tend to Infinity. I discuss why this convenient assumption does not change the feasibility result of the Linear Program. I conclude with two Conjectures, which might help to make an efficient decision on the feasibility of this Linear Program.

at March 08, 2010 01:30 AM UTC

We give new pseudorandom generators for \emph{regular} read-once branching programs of small width. A branching program is regular if the in-degree of every vertex in it is (0 or) $2$. For every width $d$ and length $n$, our pseudorandom generator uses a seed of length $O((\log d + \log\log n + \log(1/\epsilon))\log n)$ to produce $n$ bits that cannot be distinguished from a uniformly random string by any regular width $d$ length $n$ read-once branching program, except with probability $\epsilon$. We also give a result for general read-once branching programs, in the case that there are no vertices that are reached with small probability. We show that if a (possibly non-regular) branching program of length $n$ and width $d$ has the property that every vertex in the program is traversed with probability at least $\gamma$ on a uniformly random input, then the error of the generator above is at most $2 \epsilon/\gamma^2$.

at March 07, 2010 09:47 PM UTC

Three fundamental results of Levin involve algorithms or reductions whose running time is exponential in the length of certain programs. We study the question of whether such dependency can be made polynomial. (1) Levin's ``optimal search algorithm'' performs at most a constant factor more slowly than any other fixed algorithm. The constant, however, is exponential in the length of the competing algorithm. We note that the running time of a universal search cannot be made ``fully polynomial'' (that is, the relation between slowdown and program length cannot be made polynomial), unless P=NP. (2) Levin's ``universal one-way function'' result has the following structure: there is a polynomial time computable function $f_{Levin}$ such that if there is a polynomial time computable adversary $A$ that inverts $f_{Levin}$ on an inverse polynomial fraction of inputs, then for every polynomial time computable function $g$ there also is a polynomial time adversary $A_g$ that inverts $g$ on an inverse polynomial fraction of inputs. Unfortunately, again the running time of $A_g$ depends exponentially on the bit length of the program that computes $g$ in polynomial time. We show that a fully polynomial uniform reduction from an arbitrary one-way function to a specific one-way function is not possible relative to an oracle that we construct, and so no ``universal one-way function'' can have a fully polynomial security analysis via relativizing techniques. (3) Levin's completeness result for distributional NP problems implies that if a specific problem in NP is easy on average under the uniform distribution, then every language $L$ in NP is also easy on average under any polynomial time computable distribution. The running time of the implied algorithm for $L$, however, depends exponentially on the bit length of the non-deterministic polynomial time Turing machine that decides $L$. We show that if a completeness result for distributional NP can be proved via a ``fully uniform'' and ``fully polynomial'' time reduction, then there is a worst-case to average-case reduction for NP-complete problems. In particular, this means that a fully polynomial completeness result for distributional NP is impossible, even via randomized truth-table reductions, unless the polynomial hierarchy collapses.

at March 07, 2010 07:47 PM UTC

This year I’m on the PC of the following two conferences:

The Third International Workshop on Computational Social Choice will take place in Düsseldorf, Germany, September 13–16, 2010.  Deadline for submissions is May 15th.

The Second Symposium on Innovations in Computer Science (ICS 2011) will be held in Beijing, China, January 7-9, 2011.  Deadline for submissions is August 2nd.


by noamnisan at March 06, 2010 08:34 PM UTC

In this paper we give the first construction of a pseudorandom generator, with seed length $O(\log n)$, for $\mathrm{CC}_0[p]$, the class of constant-depth circuits with unbounded fan-in $\mathrm{MOD}_p$ gates, for some prime $p$. More accurately, the seed length of our generator is $O(\log n)$ for any constant error $\epsilon>0$. In fact, we obtain our generator by fooling distributions generated by low degree polynomials, over $\mathbb{F}_p$, when evaluated on the Boolean cube.This result significantly extends previous constructions that either required a long seed [LVW93] or that could only fool the distribution generated by linear functions over $\mathbb{F}_p$, when evaluated on the Boolean cube [LRTV09,MZ09]. Enroute of constructing our PRG, we prove two structural results for low degree polynomials over finite fields that can be of independent interest. 1. Let $f$ be an $n$-variate degree $d$ polynomial over $\mathbb{F}_p$.Then, for every $\epsilon>0$ there exists a subset $S \subset [n]$, whose size depends only on $d$ and $\epsilon$, such that $\sum_{\alpha \in \mathbb{F}_p^n: \alpha \ne 0, \alpha_S=0}|\hat{f}(\alpha)|^2 \leq \epsilon$. Namely, there is a constant size subset $S$ such that the total weight of the nonzero Fourier coefficients that do not involve any variable from $S$ is small. 2. Let $f$ be an $n$-variate degree $d$ polynomial over $\mathbb{F}_p$.If the distribution of $f$ when applied to uniform zero-one bits is $\epsilon$-far (in statistical distance) from its distribution when applied to biased bits, then for every $\delta>0$, $f$ can be approximated, up to error $\delta$, by a function of a small number (depending only on $\epsilon,\delta$ and $d$) of lower degree polynomials.

at March 06, 2010 02:16 PM UTC


A new system of awards for excellence in theory

Oscar is, of course, the name of the most prestigious award in film making. The award ceremony is this coming Sunday, and it will be watched by millions. This made me think about awards for our community.

Today I will talk about how we might create a new type of award for excellence in theory, based on a model that is quite different from all the current awards in computer science.

The Award Name

I need a name for the new awards, so for now let’s call them a Euclid. Obviously, the awards are named after the famous Eucild of Euclidean Geometry. It seems fitting to have an award named for the “reporter” of one of the great algorithms of all time: the Euclidean algorithm for computing the greatest common divisor (gcd) of two integers. This algorithm is beautiful, fast, and in use over 2,000 years later—without the ability to compute gcd’s fast, modern cryptography would just not work.

There is general agreement on who invented the algorithm: someone else besides Euclid. Beyond this I cannot find any definite claim of who invented it. Some believe it was Pythagoras’ school, others credit Eudoxus of Cnidus, others believe {\dots} Whoever invented it, the algorithm is associated with Euclid, and so his name will serve as an initial name for the award.

The Award Process

I think the place where we can be most inventive is in the process of selecting who gets the awards. I suggest a multiple phase process:

  • First, a select committee will propose a list of 3-5 candidates in each category for the awards. More on categories in a moment, but for now think “best result.”
  • Then, members of the theory community would vote for their choices. The members could be restricted to the obvious societies or other groups, and the voting would be done on-line.
  • Finally, after the vote the results would be announced on a known day to maximize the press.

Rich DeMillo suggested the whole process could be an online social system. He suggests some social filter or clustering method could eliminate any “select committee.” I am not sure, but this does seem like a possible approach.

The Award Categories

I think that we could easily have many categories following the Oscar model. I do not suggest an award for “best editing,” but we could have a number of awards. For example, here are a few suggested categories:

  • Best paper.
  • Best survey article.
  • Best exposition.
  • Best book: textbook, monograph, or other.
  • Best conference presentation.
  • Best new result.
  • Best new method.
  • Best Journal Editor.

DeMillo had some very interesting ideas on categories—I would usually have him add these as comments, but they are too neat to not include now:

Why not best actor awards? You can have best contributor in a starring role (full co-authorship). That way results that span many individuals get recognized. Same thing with best contributor in a supporting role (acknowledgements or citations of the form “this result is based on {\dots}” get recognized). Also best director to recognize heads of groups, labs, funding offices, etc {\dots}

In our area, theory, the significance of anything takes time to be realized. This has been the rationale for making existing awards “best paper of the last ten years,” rather than “best paper for 2010.” I wonder whether this should be a requirement for the Euclid’s. To be different, perhaps, we could actually have awards that are based only on work done in the last year. Or we could have a specific category of “best paper of the last twenty years,” and so on.

An interesting study of the Oscar awards has shown they have not been very accurate in identifying the “best” films in the long run. Best pictures have not always stood the test of time, and correspondingly pictures skipped for Oscars have often become classics. A quick look at the IMDb Top 250 tells one that 6 of the top 10 movies, including the first, haven’t won the best picture award. I think the Euclid’s might have exactly the same problem, but who cares. Why not increase excitement and honor more—if a paper or result is honored and later we discover it should not have been what is the lasting damage?

The Rationale for the Euclid

There are many reasons for the Euclid. One key reason is even those nominated would get some recognition. Today, most—all?—of our awards are completely secret. Those not in the inner circle have no idea who was nominated, we only find out who won. Being nominated for a Euclid and not getting one would be still a great honor. This is a simple amplifier used in many other awards from Oscars to the Booker Prize: why can not we also use the same amplification method? These other awards use essentially a short list notion: a list of those who are being considered for the award.

We currently announce our awards in a way set to minimize the impact. I would suggest that we have a pre-set day, a day known to all, and on the day announce all the winners in all the categories. We can do the same with the nominations, all would take place on the same day. For example, for next year we could:

  • On February 5, 2011 we announce all the nominations—the short list. There would be a formal press release and also the list would be announced on the web in all the obvious places.
  • On April 5, 2011 we would announce all the winners. Again it would be released via the web and other means.

I understand from Atish Das Sarma some conferences already do something like this with the “Best Paper Award.” At WSDM they announced the nominees months before the business meeting, and there they announce the single winner.

Winners would get some medal—perhaps a statue of Euclid—we can figure it out later. They would not be asked to give a talk, nor to give a speech, or to do anything else. They would just enjoy the honor of being the recipient of the 2011 “Best New Result Euclid.” The point here is what they do not get: they will not get a large check. My hypothesis is an award’s prestige is not linear in the amount of money given. The Oscars are a perfect example: the amount is {0}. The same with many of the literary prizes.

Open Problems

Is this a good idea? What about the name? What about the categories? What do you think?


by rjlipton at March 06, 2010 01:45 PM UTC

A commenter asked me a while back to say what I thought of the SIGCOMM submissions this year.

I'm finally getting around to reading and reviewing. (First round reviews aren't due for at least a week!) And so far, by and large, the papers I'm getting are pretty terrible.

This generally seems to happen to me on the first round, but this year is extreme. My first several papers just don't belong at this conference. (Arguably, they don't belong at any conference...) There's some number of papers submitted at every conference that are just not serious submissions, and apparently I got more than my fair share on the first round.

This makes it harder to judge the other papers -- it's hard to calibrate when you start with a lot of junk. Many of my other papers are theoretically oriented, and I'm not too optimistic about them. There's room for theory papers at SIGCOMM, but I think the bar is, rightly, pretty high. When I read a theoretical paper for SIGCOMM, I look for one of two things. First, it could be the paper has a nice theoretical idea that's actually useful. The problem there is that it's incumbent on the paper to clearly demonstrate the utility, and most fall down in that regard. I quote the SIGCOMM call: "SIGCOMM is a highly selective conference where full papers typically report novel results firmly substantiated by experimentation, simulation, or analysis." A mathematical analysis alone generally does not count as a firm substantiation. [Such papers generally have a better chance at INFOCOM -- which I think is a good, and very important, thing! There needs to be an outlet for more theoretical networking work, and perhaps it's just better suited for a big conference. Many such papers will end up having minimal impact, but once in a while, a good idea gets built on and has a real impact.]

Second, it could be the paper really challenges our way of thinking, introducing a new framework that seems a clearly important guide for future work. Such papers are rare, but important. I seem to have a number of economics-networking papers that are very high-level, and I'm really trying to understand if any of them have that character. Again, because SIGCOMM is so selective, I think the bar is very high for such papers. I'm really looking for something that enhances our fundamental understanding of the network.

That's it for now. If you have a submission, don't let my comments make you antsy -- the meeting is still a long way away, and I'm quite sure I'm not reading your paper anyway.

by Michael Mitzenmacher (noreply@blogger.com) at March 05, 2010 10:36 PM UTC

(It’s been observed that posts with technical content get no comments, while “controversial” posts get more, so here goes…)

Michael Mitzenmacher encourages the theory community to figure out what STOC/FOCS should be, while David Karger suggests that STOC/FOCS should accept more applied results. Though I love applications of theory to problems in other areas, I find myself disagreeing with Karger and thinking almost exactly the opposite.

Here’s one radical proposal, that would have the effect of making the “STOC/FOCS experience” simultaneously more and less competitive, while also (I think) attracting more people and making the conferences themselves more relevant.

The idea is to have STOC/FOCS contain only those results of broad appeal and sufficient importance to the TCS community at large, interpreted stringently. I.e., these should be the papers that everyone in the community should be reading. I think this is the intent of STOC/FOCS anyway, but this intent has been lost by accepting too many papers. As others have so often complained, STOC/FOCS has become a “vanity” conference, driven by fads, where technically strong papers solving (sometimes) problems of marginal interest often appear. That is not to say these papers aren’t good, or aren’t important in their respective subfields. All I am questioning is whether they are of interest to the TCS community at large, instead of just being of interest to some smaller sub-community.

Can we find 60 papers, twice a year, that reach this high bar? Probably not. So let’s cut the number of papers accepted. Maybe have one day of talks, with no parallel sessions (so everyone could actually attend all the talks). Moreover, I would suggest accepting fewer papers if it is judged that there weren’t enough submissions of broad enough interest. For the purposes of this post, let’s refer to this 1-day conference as SF1.

But wouldn’t this make STOC/FOCS even more competitive than it is now? Yes…and (paradoxically) no. On the one hand, getting a paper into SF1 would a great achievement, and much more difficult. On the other hand, there would no longer be the expectation (read: pressure) of graduating with 4-5 STOC/FOCS publications just to have a chance on the job market, or of trying to maintain the pace of publishing several STOC/FOCS papers per year. Moreover, there would be several “SF2″ tracks where people could publish, as I describe next.

Surely people won’t attend a one-day conference where they don’t even have a paper to present. So let’s add 2-3 days of conferences in specialized areas. (Think FCRC, but on a smaller scale.) Whatever community wants one could have one, and we could call it SF2:Algorithms, SF2:Crypto, etc.. Papers could be submitted to SF1, and considered for the appropriate track of SF2 if they are rejected from SF1. Within SF2, papers would be judged by their interest to the appropriate community (as they are now for SODA, Crypto, etc.) rather than a TCS-wide standard. Seems to me a win-win.

Let me summarize what I see as the benefits of this approach:

  • Would highlight to the TCS community what are the important papers that everyone should be reading. Would foster a sense of community among TCS researchers. Would highlight to the broader CS community what are the breakthroughs in theory.
  • Would increase attendance: everyone would want to attend SF1, and once there would also attend the SF2 in their area.
  • Would reduce some of the pressure related to STOC/FOCS, because (1) it wouldn’t be expected to have many SF1 publications, while (2) it would be easier to have an SF2 publication than it is to get into STOC/FOCS now.

Comments?


by jonkatz at March 05, 2010 09:51 PM UTC

After our success in exploring the phrase ”more or less” in many languages here is a task of a similar nature

There is a saying in Hebrew: 

“Troubles come in packages” 

צרות באות בצרורות 

“Tzarot Baot bitzrorot”.

 I am curious about analogs in other languages of this phrase.

Is there such a phrase in your language? (Or some other language that you know.) And what does it literally sais? Please, please contribute! (Other comments, links, and relevant pictures are welcomed.)

I got interested in this saying in the context of studying noise-models for fault-tolerant computation and specifically spontaneous error-synchronization, it  can also serve us also in the context of financial collapses.


by Gil Kalai at March 05, 2010 09:07 PM UTC

I set up a site to collect technical + research material about ad exchanges. I will keep the site updated. For now, it has two new research papers:
  • Selective call-out and real time bidding. T. Chakraborty, E. Even-Dar, S. Guha, Y. Mansour and S. Muthukrishnan. ArXivs 2010. When publishers access the exchange, the exchange will call out ad networks selectively in real time if they have a bound on how many auctions they can participate. The paper studies two different pricing models, has some technical connection to buffer management in networks, uses primal-dual techniques for online approximation, and shows some experimental results. It is fair to say that the authors are excited about the way they have modeled this problem that is increasingly a challenge in practice.
  • Auctions with Intermediaries. J. Feldman, V. Mirrokni, S. Muthukrishnan and M. Pai. 2010. Myerson's theory says how to auction a good for optimal revenue. In the ad exchange, we have two levels of auctions with the exchange auctioning good to the networks and the networks in turn selling to their downstream advertisers. What is the optimal revenue auction for either party in this case? This paper presents optimal auctions for ad networks and for the exchange, there are some interesting twists to the theory.
More in the future.

by metoo (noreply@blogger.com) at March 05, 2010 09:05 PM UTC

Valerie King and I recently finished a paper that I wanted to talk about here.  First, some motivation.  For a whole slew of reasons, adversarial, aka Byzantine, attacks are garnering increasing attention now in the systems community.  For example, the following position paper, from ‘08, makes a great case for Byzantine Fault Tolerance (BFT): “BFT: The Time is Now“.  Unfortunately, system folk have been less than enamored with the efficiency of Byzantine agreement algorithms:

  • “Unfortunately, Byzantine agreement requires a number of messages quadratic in the number of participants, so it is infeasible for use in synchronizing a large number of replicas” [1]
  • “Eventually batching cannot compensate for the quadratic number of messages [of Practical Byzantine Fault Tolerance (PBFT)]” [2]
  • “The communication overhead of Byzantine Agreement is inherently large” [3]

There have been many attempts to circumvent this inefficiency by 1) trying to ensure Byzantine fault tolerance in a particular system without solving Byzantine agreement; and 2) trying to solve Byzantine agreement in special cases for a type of adversary that is unique to that paper.

I think this is wrong.  Why?  To build secure systems, we need secure and trusted components, and so we shouldn’t create new components (i.e. subroutines robust to Byzantine faults) from scratch for each new system.

So why the problem with Byzantine agreement?  The problem is not the problem. Despite what naysayers may say about Byzantine agreement(BA) being “too paranoid”, “too hard”, or  “too abstract”, BA is the right problem formulation for building secure networks.  To see this, read the intros of the dozen systems papers from the last several years that say that a key component of what they want to do reduces to BA.  The problem is the solution.  Or rather the lack of a solution.  We still don’t have a general algorithm to solve BA that is practical enough to plug in to the many systems that want to solve it.  So despite all the work in this area, I think BA is still a major unsolved problem in distributed computing.

So now is the point where I’d like to tell you that we solved this problem.  Sadly we haven’t.  However, we have been working hard in the past couple of years designing algorithms for BA that reduce communication overhead.  In particular, we’ve been able to design algorithms where the number of messages each node sends out is about O(sqrt(n)) instead of O(n) (technically soft-O(sqrt(n)), which ignores log factors).  The price we pay is a non-zero probability of failure, but a probability of failure that is polynomially small in the network size.  The bigger the network, the smaller the probability of failure.

So then what about this new paper?  Our new paper describes an algorithm that solves BA with about O(sqrt(n)) messages and polylog latency, against an adaptive adversary, which can choose which nodes to take over at any time during the algorithm, up to taking over up to a 1/3 fraction of the nodes.  This is the first time we went head-to-head with an adaptive adversary and it definitely makes the problem more challenging and interesting.  We had to use a slew of new techniques (samplers, a kind of iterated secret sharing, a new technique for going from almost everywhere to everywhere BA) that I’d like to talk about in another post.  Standard caveats for theory research apply (hidden constants, hidden log terms, etc, etc) but I do think there are new ideas in this paper that can be useful in designing a practical algorithm.

OK, without any additional fanfare,  here is the paper.

References:

[1] S. Rhea, P. Eaton, D. Geels, H. Weatherspoon, B. Zhao, and J. Kubiatowicz. Pond: the OceanStore prototype. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 1–14, 2003.

[2] J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira. Hq replication: A hybrid quorum protocol for byzantine fault tolerance. In In Proceedings of Operating Systems Design and Implementation (OSDI), San Diego, CA, USA, 2005.

[3] Chien-Fu Cheng, Shu-Ching Wang, and Tyne Liang. The anatomy study of server-initial agreement for general hierarchy wired/wireless networks. Computer Standards & Interfaces, 31(1):219 – 226, 2009.


by Jared at March 05, 2010 07:25 PM UTC

I’m thrilled to announce the launch of Predictalot, a combinatorial prediction market for the NCAA Men’s Basketball playoffs. Predict almost anything you can think of, like Duke will advance further than UNC, or Every final four team name will start with U. Check the odds and invest points on your favorites. Sell your predictions anytime, even as you follow the basketball games live.

The basic game play is simple: select a prediction type, customize it, and invest points on it. Yet you’ll never run out of odds to explore: there are hundreds of millions of predictions you can make. The odds on each update continuously based on other players’ predictions and the on-court action.

Predictalot is a Yahoo! App, so you can play it at apps.yahoo.com or you can add it to your Yahoo! home page. I have to admit, it’s an incredible feeling to play a game I helped design right on the Yahoo! home page.

Predicalot app on the Yahoo! home page

That’s all you need to get started. If you’re curious and would like a peek under the hood, read on: there’s some interesting technology hidden in the engine.

Background and Details

Predictalot is a true combinatorial prediction market of the sort academics like us and Robin Hanson have been dreaming about since early in the decade. We built the first version during an internal Yahoo! Hack Day. Finally, we leveraged the Yahoo! Application Platform to quickly build a public version of the game. (Note that anyone can develop a YAP app that’s visible to millions — there’s good sample code, it supports YUI and OpenSocial, and it’s easy to get started.) After many fits and starts, late nights, and eventually all nights, we’re proud and excited to go live with Predictalot version 1.0. I can’t rave enough about the talent and dedication of the research engineers who gave the game a professional look and feel and production speed, turning a pie-in-the-sky idea into reality. We have many features and upgrades in mind for future versions, but the core functionality is in place and we hope you enjoy the game.

In the tournament, after the play-in game, the 64 top college basketball teams play 63 games in a single elimination tournament. So there are 2 to the power 63 or 9.2 quintillion total possible outcomes, or ways the entire tournament can unfold. Predictalot implicitly keeps track of the odds for them all. To put this in perspective, it’s estimated that there are about 10 quintillion individual insects on Earth. Of course, for all practical purposes, we can’t store 9.2 quintillion numbers, even with today’s computers. Instead, we compute the odds for any outcome on the fly by scanning through the predictions placed so far.

A prediction is a statement, like Duke will win in the first round, that will be either true or false in the final outcome. In this case, the prediction is true in exactly half, or 2 to the power 62 outcomes. (Note this does not mean the odds are 50% — remember the outcomes themselves are not all equally likely.) In theory, Predictalot can support predictions on any set of outcomes. That’s 2 to the power 2 to the power 63, or more than a googol predictions. For now, we restrict you to “only” hundreds of millions of predictions categorized into thirteen types. Computing the odds of a prediction precisely is too slow. Technically, the problem is #P-hard: as hard as counting SAT and harder than the travelling salesman problem. So we must resort to approximating the odds by randomly sampling the outcome space. Sampling is a tricky business — equal parts art and science — and we’re still actively exploring ways to increase the speed, stability, and accuracy of our sampling.

Because we track all possible outcomes, the predictions are automatically interconnected in ways you would expect. A large play on Duke to win the tournament instantly and automatically increases the odds of Duke winning in the first round; after all, Duke can’t win the whole thing without getting past the first round.

With 9.2 quintillion outcomes, Predictalot is to our knowledge the largest prediction market built, testing the limits of what the wisdom of crowds can produce. Predictalot is a game, and we hope it’s fun to play. We’d also like to pave the way for serious use of combinatorial prediction market technology.

Why did Yahoo! build this? Predictalot is a smarter market, letting humans and computers each do what they do best. People enter predictions in simple terms they understand like how one team fares against another. The computer handles the massive yet methodical number crunching needed to combine all the pieces together into a coherent overall prediction of a complex event. Markets like Predictalot, WeatherBill, CombineNet, and Internet advertising systems, to name a few, represent the evolution of markets in the digital age, empowering users with extreme customization. More and more, matching buyers with sellers — the core function of markets — requires sophisticated algorithms, including machine learning and optimization. Predictalot attempts to illustrate this trend in an entertaining way.

David Pennock
Mani Abrol, Janet George, Tom Gulik, Mridul Muralidharan, Sudar Muthu, Navneet Nair, Abe Othman, Daniel Reeves, Pras Sarkar

by David Pennock at March 05, 2010 12:10 PM UTC

The Tile Assembly Model is a Turing universal model that Winfree introduced in order to study the nanoscale self-assembly of complex (typically aperiodic) DNA crystals. Winfree exhibited a self-assembly that tiles the first quadrant of the Cartesian plane with specially labeled tiles appearing at exactly the positions of points in the Sierpinski triangle. More recently, Lathrop, Lutz, and Summers proved that the Sierpinski triangle cannot self-assemble in the "strict" sense in which tiles are not allowed to appear at positions outside the target structure. Here we investigate the strict self-assembly of sets that approximate the Sierpinski triangle. We show that every set that does strictly self-assemble disagrees with the Sierpinski triangle on a set with fractal dimension at least that of the Sierpinski triangle (roughly 1.585), and that no subset of the Sierpinski triangle with fractal dimension greater than 1 strictly self-assembles. We show that our bounds are tight, even when restricted to supersets of the Sierpinski triangle, by presenting a strict self-assembly that adds communication fibers to the fractal structure without disturbing it. To verify this strict self-assembly we develop a generalization of the local determinism method of Soloveichik and Winfree.

at March 05, 2010 08:43 AM UTC

We systematically study the hardness and the approximability of combinatorial multi-objective NP optimization problems (multi-objective problems, for short). We define solution notions that precisely capture the typical algorithmic tasks in multi-objective optimization. These notions inherit polynomial-time Turing reducibility from multivalued functions, which allows us to compare the solution notions and to define corresponding NP-hardness notions. For both we prove reducibility and separation results. Furthermore, we define approximative solution notions and investigate in which cases polynomial-time solvability translates from one to another notion. For problems where all objectives have to be minimized, approximability results translate from single-objective to multi-objective optimization such that the relative error degrades only by a constant factor. Such translations are not possible for problems where all objectives have to be maximized, unless P=NP. As a consequence we see that in contrast to single-objective problems, where the solution notions coincide, the situation is more subtle for multiple objectives. So it is important to exactly specify the NP-hardness notion when discussing the complexity of multi-objective problems.

at March 05, 2010 08:25 AM UTC

Last week was the Statistical and Learning-Theoretic Challenges in Data Privacy, which I co-organized with Cynthia Dwork, Steve Fienberg and Sesa Slavkovic. As I explained in my initial post on the workshop, the goal was to tie together work on privacy in statistical databases with the theoretical foundations of learning and statistics. Slides for most talks [...]

by adamdsmith at March 05, 2010 01:55 AM UTC

Authors: Joerg Endrullis
Download: PDF
Abstract: In [EGZ09] it has been shown that infinitary strong normalization (SNi) is Pi-1-1-complete. Suprisingly, it turns out that infinitary weak normalization (WNi) is a harder problem, being Pi-1-2-complete, and thereby strictly higher in the analytical hierarchy.

at March 05, 2010 01:30 AM UTC