Tuesday, September 23, 2014

Nash Equilibrium for Inter Dependent Repeating Decision Making Events

Hi All,

The concept of Nash Equilibrium is well known. The idea is that the outcome of multiple decision making activities is dependent on the best decision taken by each decision making party and considering the decision by every other decision making party.

Quoting Wikipedia, if Bill and Amy are playing a game, if Bill makes his best possible decision taking into account Amy's decision and Amy makes her best possible decision taking into account Bill's decision, Bill and Amy are in Nash Equilibrium.

Now, consider that Bill and Amy are meeting for the second time. In the first meeting,  Bill and Amy did not have a great experience. Say Bill offered Amy a gift and Amy did not like the gift. The second time Bill and Amy meet, there may or may not be a partial overlap of the original input set (Bill may or may not offer Amy a gift). However, Amy's interpretation of the second event will be influenced by the outcome of the previous meeting.

Consider an advertisement rebidding auction. An intelligent system shall take into consideration the results of previous bidding outcomes. A consistent winner shall have a bigger constant of victory probability.

Based on mixed strategies Nash Equilibrium, where players choose a probability distribution, simply put, given a set of repeating interactions between a set of decision making parties with events which may or may not have overlapping input sets, each event shall lead to a certain amount of biasing of the probability distribution towards the winner.

This can help in modelling real world scenarios where improving the probability of output in a mixed strategies Nash Equilibrium state for repeated interactions between same decision making parties.
A bias towards successful interactions can lead to better and higher winning probabilities in the equilibrium state.

Nash Equilibrium assumes a same state of initial influence of each decision making party. The above model can help if the initial states are itself biased i.e. if one decision making party has a higher influence, the bias can be more towards him. However, if the party does not have an optimal equilibrium, the biasing can shift away hence normalizing the distribution.

The system is essentially introducing a 'feedback' mechanism to the decision making process for further repetitions. The probability distribution functions shall be arranged in a manner that any further interactions shall have a higher bias towards the upper half of the probability distribution spectrum.

Essentially, the idea in the theory that I am talking about in this post is that Nash Equilibrium does not have Markov property hence can be fruitfully biased based on previous events to have a better probability distribution.

I ran some tests for the theory and the results seem to be inline with what I researched above. The biasing constant can be a constant for each decision making party involved or can be experimentally determined for each decision making party involved or can be started off with a basic value and changed (learned) over time.

Of course, it is a pretty basic form and I shall have to put in more research to develop this further and submit it for peer review journals, but the initial outlook seems good!

Feedback would be great.

Peace,

Atri

Tuesday, July 15, 2014

Time to retire B Tree from database storage? Rise of cross correlation and the need for change in design

B Tree has been the standard data structure to store data in a database. Pages for a database are maintained in a BTree with various semantics. BTree has been the stable choice for such applications for a long time. BTree is perfect for many reasons. It has more width, allowing for lesser disk access even for a large number of pages, allowing for better storage structures, faster access of pages.

But, BTree has a drawback, which is actually a drawback for all one dimensional data structures. BTree gives no way to track correlations between multiple attributes or capture the "one-ness" of the various attributes present in the same tuple.

Building a BTree index on top of the data doesnt help as well. All attributes are treated independently and the data is arranged per page or per attribute.

The problem here is that the current design can lead to query planner not knowing anything about correlated data or the possibility of correlation between the actual data present in two attributes of a relation. This can lead to query planner not using the correlation to generate more optimal plans.

Let us discuss a case where this kind of design can lead to sub optimal query plans. Consider the following data:

CarAge
(months)
Minimum Stopping at 40 kph
(metres)
A928.4
B1529.3
C2437.6
D3036.2
E3836.5
F4635.3
G5336.2
H6044.1
I6444.8
J7647.2

A strong correlation is present between car age and minimum stopping at 40 kph. Looking for a query which could be a part of an analytical application. A predictive application trying to look at predicting the next tuple should get plans based on the already existing correlation between car age and minimum stopping at 40 kph.

If we use a data structure like Doubly Chained Tree (http://en.wikipedia.org/wiki/Doubly_chained_tree), we can effectively capture the data that is "multi dimensional". The capturing can be as simple maintaining pointers to data items for the same attribute to having a complex design capturing the actual correlation in terms of median and other statistical terms.

Using a multi dimensional data structure also allows for faster lookups from disk for queries that involve fetching large data for a single attribute and then joining it to another attribute of the same table. Using DCT, we can simply go and fetch the values for both the attributes simultaneously, using multiple pointers present.

This shall require a lot of research. Lets hope we get positive results for this one.

For more details, please look at:  https://github.com/atris/Concerted

Wednesday, May 21, 2014

The Unicorn Chase

Unicorns.

The mythical, mystical, beautiful inhabitants of our fairy tale world who represent love, beauty, peace, passion, glory, happiness and the general attributes of everything good.

Unicorns are supposed to be pure, untouched, innocent. They represent everything that the human spirit can think of. Deep down within, we all wish to be in a state as the unicorns. Happy, contented, carefree, and masters of our selves.

This leads us to the unicorn chase. Unicorn chase represents the run after happiness that we are all engaged in. We try to get to a point where we can be happy in totality. Hence, we start running after 'Unicorns' or goals, then bigger goals, then even bigger goals.

While that is never a bad idea, one thing that we forget is that the path of life does not mandate us to seek happiness in all aspects of life. Happiness does not mean having the best in *all* parts of our life. We try to achieve a state where everything we think and do should be completed, achieved, and found in glory.

Sit back and think when was the last time you tried appreciating the beauty of the voice of your friend, notice the small sounds that occur outside your window, or just breath freely, openly, and in a carefree state?

Happiness is different for each one of us. We try to achieve happiness by the de facto standards, which isnt really a good idea. We must all find our own unicorns, then achieve them.

Our chases are different, and they arent really a chase. Its a path, where you tread, move, and reach the destination. You shall find your unicorn standing there.

Sometimes, we try to drive life beyond a point and forget that there are somethings that happen totally for our good, even if we dont realize that at this point of time.

For e.g., when I did not get into a top grade engineering college, I was a bit disappointed, like anyone else in that situation would be (although since my expectations were low, I was not that sad). I thought that the way I could be happiest or even happy is if I get to a top grade institution.

When I joined JIIT, I was happy but that pang of sadness hadnt really gone away. With time, what I  did realize was that JIIT was probably the best thing that happened to me. I could stay with my parents at home, and since my home is close by, I could have the fun of hostlers as well. This gave me the time to focus on my studies and work and achieve things that I felt were important. Also, JIIT gave me the freedom to work on things I liked, and passing wasnt such a hard task there, so I had ample time to go and chase my unicorn.

I am not saying that I wouldnt have been happy at some more prestigious institution. I would probably have more avenues and opportunities to do things there, but would I be upto the challenge? Would I really be at home in a bigger institution? Would my priorities, like freedom, autonomy, anytime access to anybody in the institution be present? Would my unicorn be waiting for me there?

We normally do not get all that we seek in a single opportunity. If we keep looking for the opportunity that gives us everything we want at once, we would probably get nowhere. What we need to do is identify what matters to us, who matters to us and what we are willing to sacrifice. Then, take the path to our destiny and achieve our unicorn.

Never be ashamed to be scared. Never be ashamed to be emotional. Never be scared to dream bigger. Never be scared to love passionately. Never be ashamed to recognize and identify your weakness.

Above all, never be scared to admit that your unicorn is imperfect. It may be imperfect, but it is yours.

Happy traveling down your path to achieve your unicorn.

Peace,

Atri



Wednesday, April 2, 2014

Query plan risk estimation -- 'Query Fudge Factor'

Hi all,

Recent discussion on PG-Hackers has led me to this blog post. The concept of query risk estimation is a concept which shall again trod down the same path i.e. the path of improving the estimation done by the query planner.

The idea started with a discussion on planner hints and why PostgreSQL doesn't have them or shouldn't have them. The main part is that the user cant be allowed to give a plan to the query planner and force its execution, but the user knows more about the data than we do, hence there may be patterns in the data that can lead to good/bad plans and that we aren't aware of.

Lets take an example. One of the most common examples we see is the lack of correlation between multiple attributes in a table. The data for many use cases is some real world data (insurance data for e.g.). Now, PostgreSQL query planner does not take cross column statistics into consideration and hence there may be patterns that we are missing.

Is allowing the user to tell a plan or even a part of plan and the planner following only that plan really a good idea? Maybe not, but we could go and actually get some hinting from the user in place for the selectivity it expects for joining attributes. We shall not have the hints that Oracle has. We can have something which is much cleaner and easier to use,though. That will need more thinking and planning than what we currently know.

There is one thing that can be done which can prove helpful though. There is a source of problems that arise in the query planner due to a reason I shall explain below:

Suppose a table is made with insertion of some data. The query planner maintains data about the distribution of values in the attributes in the table. The data is maintained in two forms MCVs and histogram of the remaining values. MCVs are Most Common Values are values which are most frequent values in the attribute and the exact frequency of MCVs are maintained.

The statistical details maintained in query planner are updated by executing ANALYZE command or are updated by VACUUM.

Suppose the table is inserted with a lot of data/ deletion of a lot data that happens suddenly. The autovaccum run is still pending and hence the distribution in the actual data has changed a lot.n The known stats in the query planner are outdated now and selectivity estimates on those query planner stats will be wrong. Very wrong.

This is not the only case. For e.g., as mentioned by Tom lane, LIMIT assumes a normal distribution of the data and then plans, which can be quite incorrect.

A recent discussion about that was introduction of a query plan risk estimation. The point is to add a new factor in query planning which is the amount of confidence we have in the query plan or selectivity estimate the query planner has just given.

One way is to use a statistical method to predict the amount of deviation of the current actual histogram of an attribute of a table from the known stats in the query planner. The farther the distance, the lesser the confidence in the plan. The same can be achieved using Central Limit Theorem.

A method I proposed is:

http://www.postgresql.org/message-id/CAOeZVieVa+V-6NYzLLofG6uXfs9Y7N4cUzeoYrYKaQg23nohuw@mail.gmail.com

 I havent researched it into deep yet but plan to soon enough. I would love to get some feedback on this though.

I hope that we have something on the lines of query risk factor in soon so that our query plans are better performing.

Peace,

Atri

Monday, March 31, 2014

Life, the universe and everything

Hi all,

I have had been aiming to write the current blog post for quite a long time now. I was recently asked to write up something on placement techniques and how to prepare for placements. I thought of writing this post NOT as a placement help blog post but as a post I have been meaning to write for some time now.

Please note that the current blog post isnt directly what you expect for placement help, but will help you in getting you a job if you do try to make an effort and follow your heart. Also, the blog shall require patience to read and follow. Please bear with me.

The title of the post refers to a famous programming problem with the same name. It also encompasses the spirit of this post. This post encompasses what a major part of my world constitutes of.

I shall start with a small story. Circa 2010, a nerdy boy was sweating it out because he was bad at programming. He couldnt do a problem to print star pattern or anything simple. Hell, he couldnt even figure out simple problems that involved brute force technique. His parents are quite accomplished programmers (they had two startups under their belts which they had done 14 and 17 years ago). As a student of first year in B.Tech Information Technology, he had no idea where he was heading.

His mom sat down with him and taught him some basics. She showed him how to think and approach problems. The boy started with the star pattern, then developed an interest. He tried solving more problems, gained more interest, solved more problems, learnt more, tried more and went along.

Fast forward to today, the boy is writing to you through this post. I am no big star or something, but there is one accomplishment I am proud of. I am an obsessed learner now, and my hunger grows everyday.

What drives me to work in office, come back and work in PostgreSQL? What drives me to take books and read them whenever I can? What drives me to discover, explore, invent, have fun?

The feeling of love. The feeling of joy. The feeling of ownership. The sense of accomplishment when building something. The feeling of constant learning.

This is what I want to share today. I know its a pretty cliché thing I am telling.

There is learning where there is love. Do not aspire for a job where you are not in the domain which you love.

If you are reading till here, I think that you love what you do and think that this is where you should be.

The next thing is what to do when you know you love this. Technical advice:

1) Read as much as you can. Read Cormen, The Algorithm Design Manual, Tardos, Sedgewick, Sahni, Trembley and Sorenson. Pick out specific topics from each and read.

2) Practise. Solve any problem that comes in your way. The thing is that you have to think of different approaches for each problem and not stay on one solution. Also, try to optimize your solution. For optimizing your solution you have to identify the areas where your solution is slow. Break down your problem and solution into tangible components and try to optimize them each and also try to see how the multiple components interact with each other.

3) Think about the problem. Do NOT start solving on the first step. Think and re read the problem. List the input and output and then see if there is anything you can see, a pattern or anything.

In the end its all about what you feel is the right way for your brain to engulf the problem and give a solution workable and good. The level of goodness depends on your gut.

Above all, it all boils down to one thing. It is to have fun. You have to have fun in order to do anything sane and last long enough to make an impact in any field.

If you feel that the work you are doing completes you, if you feel what you are striving for is worth it, and if you feel you can stick to it, you shall find the way which is correct for yourself.

Be open, have an open mind, debate and  take a stand for what you think is correct. Listen to your instinct, it knows the solution to the problem you are trying to solve.

For more information on something relevant read the following:

http://norvig.com/21-days.html

This is the best post about our trade I have ever read. I hope you find it useful.

Peace,

Atri