Hunter

Sunday, 8 December 2013

Java useful command

-To run java program in debug mode
pass -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=<port_NO> as JVM arguments

Thursday, 5 December 2013

Windows useful commands and default directory structure common software.

To start service in window OS from command line.

net start/stop/pause/continue <<service_name>>

To start Clear case service in Windows.

net start albd.

Tuesday, 26 November 2013

Most used Commands and Standered file directory path in Linux

To modify host name and IP address in Linux

/etc/hosts file

create bond( /etc/sysconfig/network-scripts/ifcfg-bond0)

DEVICE=bond0

ONBOOT=yes

USERCTL=no

BOOTPROTO=none

BONDING_OPTS="primary=eth0"

IPADDR=xx.xx.xxx.xx

NETMASK=xxx.xxx.xxx.x

GATEWAY=xx.xx.xxx.x

BROADCAST=xx.xx.xxx.xxx

configure net work(/etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

ONBOOT=yes

USERCTL=no

BOOTPROTO=none

SLAVE=yes

MASTER=bond0

change the domine name

/etc/sysconfig/network(hostname)

/etc/hosts

service network restart

To delete user in Linux

userdel -r userName(-r used to delete along with home directory)

Monday, 16 September 2013

12 Little Known Laws of Karma (That Will Change Your Life)

What is Karma? Karma is the Sanskrit word for action. It is equivalent to Newton's law of ‘every action must have a reaction’. When we think, speak or act we initiate a force that will react accordingly. This returning force maybe modified, changed or suspended, but most people will not be able eradicate it.

12 Little Known Laws of Karma (That Will Change Your Life)

This law of cause and effect is not punishment, but is wholly for the sake of education or learning. A person may not escape the consequences of his actions, but he will suffer only if he himself has made the conditions ripe for his suffering. Ignorance of the law is no excuse whether the laws are man-made or universal. To stop being afraid and to start being empowered in the worlds of karma and reincarnation, here is what you need to know about karmic laws.

THE GREAT LAW

- "As you sow, so shall you reap". This is also known as the "Law of Cause and Effect".
- Whatever we put out in the Universe is what comes back to us.
- If what we want is Happiness, Peace, Love, Friendship... Then we should BE Happy, Peaceful, Loving and a True Friend.

THE LAW OF CREATION

- Life doesn't just HAPPEN, it requires our participation.
- We are one with the Universe, both inside and out.
- Whatever surrounds us gives us clues to our inner state.
- BE yourself, and surround yourself with what you want to have present in your Life.

THE LAW OF HUMILITY

- What you refuse to accept, will continue for you.
- If what we see is an enemy, or someone with a character trait that we find to be negative, then we ourselves are not focused on a higher level of existence.

THE LAW OF GROWTH

- "Wherever you go, there you are".
- For us to GROW in Spirit, it is we who must change - and not the people, places or things around us. - The only given we have in our lives is OURSELVES and that is the only factor we have control over. - When we change who and what we are within our heart our life follows suit and changes too.

THE LAW OF RESPONSIBILITY

- Whenever there is something wrong in my life, there is something wrong in me.
- We mirror what surrounds us - and what surrounds us mirrors us; this is a Universal Truth.
- We must take responsibility what is in our life.

THE LAW OF CONNECTION

- Even if something we do seems inconsequential, it is very important that it gets done as everything in the Universe is connected.
- Each step leads to the next step, and so forth and so on.
- Someone must do the initial work to get a job done.
- Neither the first step nor the last are of greater significance,
- As they were both needed to accomplish the task.
- Past-Present-Future they are all connected...

THE LAW OF FOCUS

- You can not think of two things at the same time.
- When our focus is on Spiritual Values, it is impossible for us to have lower thoughts such as greed or anger.

THE LAW OF GIVING AND HOSPITALITY

- If you believe something to be true,then sometime in your life you will be called upon to demonstrate that particular truth.
- Here is where we put what we CLAIM that we have learned, into actual PRACTICE.

THE LAW OF HERE AND NOW

- Looking backward to examine what was, prevents us from being totally in the HERE AND NOW.
- Old thoughts, old patterns of behavior, old dreams... - Prevent us from having new ones.

THE LAW OF CHANGE

- History repeats itself until we learn the lessons that we need to change our path.

THE LAW OF PATIENCE AND REWARD

- All Rewards require initial toil.
- Rewards of lasting value require patient and persistent toil. -
True joy follows doing what we're suppose to be doing, and waiting for the reward to come in on its own time.

THE LAW OF SIGNIFICANCE AND INSPIRATION

- You get back from something whatever YOU have put into it.
- The true value of something is a direct result of the energy and intent that is put into it.
- Every personal contribution is also a contribution to the Whole.
- Lack luster contributions have no impact on the Whole, nor do they work to diminish it.
- Loving contributions bring life to, and inspire, the Whole.

Sources: Bliss Returned and Raven Emrys

Monday, 25 February 2013

Java technology, IBM style: Garbage collection policies, Part 1

Differing policies provide flexibility

Summary: One of the great benefits of the Java™ platform is that it takes care of much of the work of garbage collection for you, but there are occasions when you still want to tweak the way garbage collection takes place. With the latest Java technology implementation from IBM®, you can choose among several garbage collection policies to help you get the most out of your application. In this second article in the series Java technology, IBM style, Java developer Mattias Persson explores the available options and details the situations in which each might be appropriate.

Garbage collection (GC) in the IBM Developer Kit for the Java 5.0 Platform (IBM SDK) can be configured using four different policies. This article, the first of two parts on GC, introduces the different garbage collection policies and discusses their general characteristics. You should have a basic understanding of garbage collection in the Java platform before you begin. Part 2 presents a quantitative approach to choosing a policy, along with with some examples.

About the series

The Java technology, IBM style series takes a look at the latest releases of the IBM implementations of the Java platform. You'll learn how IBM has implemented some of the advances built into version 5.0 of the Java platform, and you'll find out how to use some of the value-added features built into new releases from IBM.

Contact the authors individually with comments or questions about their articles. To comment on the series as a whole, you may contact series lead Chris Bailey. For more on the concepts discussed here and links where you can download the latest IBM releases, see Resources.

Why different GC policies?

The availability of differing GC policies gives you increased capabilities. There are many different algorithms available for GC, each of which has advantages and drawbacks, depending on the type of workload. (If you are unfamiliar with the general topic of GC algorithms, see Resources for further reading.) In the IBM SDK 5.0, you can configure the garbage collector with one of four policies, each of which uses its own algorithm. The default policy is sufficient for most applications. If you do not have any particular performance requirements for your application, then the content of this article (and the next) may not be interesting to you; you can run the IBM SDK 5.0 without changing the GC policy. However, if your application requires optimal performance or if you are generally concerned with the length of GC pauses, then read on. You will see that the latest release from IBM has even more choice than its predecessors.

So why doesn't the IBM implementation of the Java runtime make the choice automatically for you? This isn't always possible. It's difficult for the runtime to understand your needs. In some situations, you may want to run the application with high throughput; in others, you might prefer to reduce the pause times.

Table 1 lists the available policies and explains when you should use each one. The following sections describe the characteristics of each policy in more detail.

Table 1. GC policies in the IBM SDK 5.0

Policy	Option	Description
Optimize for throughput	`-Xgcpolicy:optthruput`(optional)	The default policy. It is typically used for applications where raw throughput is more important than short GC pauses. The application is stopped each time that garbage is collected.
Optimize for pause time	`-Xgcpolicy:optavgpause`	Trades high throughput for shorter GC pauses by performing some of the garbage collection concurrently. The application is paused for shorter periods.
Generational concurrent	`-Xgcpolicy:gencon`	Handles short-lived objects differently than objects that are long-lived. Applications that have many short-lived objects can see shorter pause times with this policy while still producing good throughput.
Subpooling	`-Xgcpolicy:subpool`	Uses an algorithm similar to the default policy's but employs an allocation strategy that is more suitable for multiprocessor machines. We recommend this policy for SMP machines with 16 or more processors. This policy is only available on IBM pSeries® and zSeries® platforms. Applications that need to scale on large machines can benefit from this policy.

Some terms defined

Throughput is the amount of data processed by an application. Throughput must be measured with an application-specific metric.

Pause time is the duration of time in which the garbage collector has paused all application threads to collect the heap.

In this article, I use the abbreviations from the command-line options detailed in Table 1 to denote these policies: optthruput for Optimize for throughput, optavgpause for Optimize for pause time,gencon for Generational concurrent, and subpool for Subpooling.

When should I consider a non-default GC policy?

I advise that you always start with the default GC policy. Before you move away from the default, you need to understand the circumstances under which you should explore an alternative policy. Table 2 presents a set of reasons that could apply:

Table 2. Reasons for switching to an alternative GC policy

Switch to	Reasons
`optavgpause`	My application cannot tolerate the length of the GC pauses. A degradation in performance is acceptable as long as the GC pause time is reduced. I am running on a 64-bit platform and use a very large heap -- more than 3 or 4GB. My application is a GUI application and I'm concerned about the user response times.
`gencon`	My application allocates many short-lived objects. The heap space is fragmented. My application is transaction-based (that is, objects in the transaction don't survive beyond the transaction commit).
`subpool`	I have scalability problems on large multiprocessor machine.

Let me stress that the reasons mentioned in Table 2 are not sufficient to conclude that the alternative policies will perform better; they are simply hints. In all cases, you should run the application and measure throughput and/or response times in combination with GC pause times. The next part of this series shows examples of this sort of testing.

The remaining sections of this article describe in more detail the differences between the GC policies.

optthruput

Signs of heap fragmentation

The most common sign of heap fragmentation is the presence of premature allocation failures. This is seen in verbose garbage collection (-Xverbose:gc) as GC being triggered even though there is free space on the heap. Another sign is the presence of small thread allocation caches (see the subsection "Heap lock and thread allocation caches"). You can use -Xtgc:freelist to determine the average size. Both the options are explained in the IBM SDK 5 Diagnostics Guide available in Resources.

optthruput is the default policy. It is a tracing collector, referred to as a mark-sweep-compact collector. The mark and the sweep phases always run during GC, but compaction only happens under certain circumstances. The mark phase finds and marks all live objects. The sweep phase removes all unmarked objects. A third and optional step is compaction. There are various circumstances under which compaction might occur; the most common one is the system's inability to reclaim enough free space.

Fragmentation occurs when objects are allocated and freed so often that only small chunks of free space are left in the heap. There may be a large amount of free space overall in the heap, but the contiguous regions are small, causing allocation failures.Compaction moves all the objects down to the beginning of the heap and aligns them so that no space exists between them. This removes fragmentation from the heap but is an expensive task, so it is only performed if necessary.

Figure 1 shows an outline of the heap layout after the different phases: mark, sweep, and compaction. The dark regions represent objects and the bright regions are free space.

Mark and sweep

The mark phase traverses all objects that can be referenced from thread stacks, statics, interned strings, and JNI references. As part of this process, we create a mark bit vector that defines the beginning of all live objects.

The sweep phase uses the mark bit vector generated by the mark phase to identify the chunks of heap storage that can be reclaimed for future allocations; these chunks are added to the free list.

Figure 1. Heap layout before and after garbage collection

The details of how the different GC phases work is beyond the scope of this article; my focus is on ensuring you understand the runtime characteristics. I encourage you to read the Diagnostics Guide (see Resources) for more information.

Figure 2 illustrates how the execution time is distributed between the application thread (or mutator) and GC threads. The horizontal axis is elapsed time and the vertical axis contains the threads, where n represents the number of processors on the machine. For this illustration, assume that the application is using one thread per processor. GC is illustrated by the blue boxes, which show stopped mutators and GC threads running. These collections consume 100 percent of the CPU's resources, and the mutator threads are idle. The figure overgeneralizes a bit so that we can compare this policy to other policies in this article. In actuality, GC duration and frequency vary depending on application and workload.

Figure 2. Distribution of CPU time between mutators and GC threads in the optthruput policy

Mutator vs. GC threads

A mutator thread is the application program that allocates objects. Another word for mutator isapplication. A GC thread is part of the memory management and performs the garbage collection.

Heap lock and thread allocation caches

The optthruput policy uses a contiguous heap area that all threads in the application share. A thread needs exclusive access to the heap for reserving space for new objects. This lock, called theheap lock, ensures that only one thread at a time can allocate an object. On machines with multiple CPUs, this lock can cause scaling problems because multiple allocation requests can occur at the same time, but each needs to have exclusive access to the heap lock.

To reduce this problem, each thread reserves a small chunk of memory called a thread allocation cache (also referred to asthread local heap, or TLH). This piece of storage is exclusive to a thread, so while allocating from it, the heap lock is not used. When the allocation cache is full, the thread goes back to the heap and asks for a new one, using the heap lock.

Fragmentation of the heap can prevent the thread from getting large TLHs, so a TLH fills quickly, causing the application thread to come back frequently to the heap for a new one. In this situation, the heap lock becomes a bottleneck; in such situations, thegencon or subpool policies can provide good alternatives.

optavgpause

For many applications, throughput is not as important as fast response time. Consider an application that processes work items that cannot take longer than 100 milliseconds to complete. With GC pauses in the 100 millisecond range, you get items that cannot be processed within this timeframe. A problem with garbage collection is that the pause time increases the maximum time it takes to process an item. Large heap sizes (available on 64-bit platforms) increase this effect because more objects are processed.

optavgpause is an alternative GC policy designed to keep pauses to a minimum. It does not guarantee a particular pause time, but pauses are shorter than those produced by the default GC policy. The idea is to perform some garbage collection work concurrently while the application is running. This is done in two places:

Concurrent mark and sweep: Before the heap is filled up, each mutator helps out and mark sobjects (concurrent mark). There is still a stop-the-world GC, but the pause is significantly shorter. After GC, the mutator threads help out and sweep objects (concurrent sweep).
Background GC thread: One (or more) low-priority background GC threads perform marking while the application is idle.

There is a throughput performance penalty of 5 to 10 percent over the default GC policy, depending on the application.

Figure 3 illustrates how the execution time can be divided between GC threads and mutator threads using optavgpause. The background tracing thread is not shown because it should not affect application performance.

Figure 3. Distribution of CPU time between mutators and GC threads in the optavgpause policy

The gray areas in the figure imply that concurrent tracing is enabled and that each mutator thread has to give up some of its processing time. Each concurrent phase is followed by a full garbage collection that completes the marking and sweeping that did not happen during the concurrent phase. The pause that results from this should be much shorter than the normal GC seen in optthruput, as indicated by a smaller box on the timescale in Figure 3. The gap between the end of the GC and the start of the concurrent phase vary, but during this phase, there is no significant performance impact.

gencon

A generational garbage collection strategy considers the lifetime of objects and places them in separate areas of the heap. In this way, it tries to overcome the drawbacks of a single heap in applications where most objects die young -- that is, where they do not survive many garbage collections.

With generational GC, objects that tend to survive for a long time are treated differently from short-lived objects. The heap is split into a nursery and a tenured area, as illustrated in Figure 4. Objects are created in the nursery and, if they live long enough, are promoted to the tenured area. Objects are promoted after having survived a certain number of garbage collections. The idea is that most objects are short-lived; by collecting the nursery frequently, these objects can be freed up without paying the cost of collecting the entire heap. The tenured area is garbage collected less often.

Figure 4. New and old area in gencon GC
New and old area in gencon garbage collection

As you can see in Figure 4, the nursery is in turn split into two spaces: allocate and survivor. Objects are allocated into the allocate space and, when that fills up, live objects are copied into the survivor space or into the tenured space, depending on their age. The spaces in the nursery then switch use, with allocate becoming survivor and survivor becoming allocate. The space occupied by dead objects can simply be overwritten by new allocations. Nursery collection is called a scavenge; Figure 5 illustrates what happens during this process:

Figure 5. Example of heap layout before and after GC

When the allocate space is full, garbage collection is triggered. Live objects are then traced and copied into the survivor space. This process is really inexpensive if most of the objects are dead. Furthermore, objects that have reached a copy threshold count are promoted into the tenured space. The object is then said to be tenured.

As the name Generational concurrent implies, the gencon policy has a concurrent aspect to it. The tenured space is concurrently marked with an approach similar to the one used in the optavgpause policy, except without concurrent sweep. All allocations pay a small throughput tax during the concurrent phase. With this approach, the pause time incurred from the tenure space collections is kept small.

Figure 6 shows how the execution time maps out when running gencon GC:

Figure 6. Distribution of CPU time between mutators and GC threads in gencon

A scavenge is short (shown by the small red boxes). Gray indicates that concurrent tracing starts followed by a collection of the tenured space, some of which happens concurrently. This is called a global collection, and it includes both a scavenge and a tenure space collection. How often a global collection occurs depends on the heap sizes and object lifetimes. The tenured space collection should be relatively quick because most of it has been collected concurrently.

subpool

The subpool policy can help increase performance on multiprocessor systems. As I mentioned earlier, this policy is available only on IBM pSeries and zSeries machines. The heap layout is the same as that for the optthruput policy, but the structure of the free list is different. Rather than having one free list for the entire heap, there are multiple lists, known as subpools. Each pool has an associated size by which the pools are ordered. An allocation request of a certain size can quickly be satisfied by going to the pool with that size. Atomic (platform-dependent) high-performing instructions are used to pop a free list entry off the list, avoiding serialized access. Figure 7 shows how the free chunks of storage are organized by size:

Figure 7. Subpool free chunks ordered by size

When the JVMs start or when a compaction has occurred, the subpools are not used because there are large areas of the heap free. In these situations, each processor gets its own dedicated mini-heap to satisfy requests. When the first garbage collection occurs, the sweep phase starts populating the subpools, and subsequent allocations mainly use subpools.

The subpool policy can reduce the time it takes to allocate objects. Atomic instructions ensure that allocations happen without acquiring a global heap lock. Mini-heaps local to a processor increase efficiency because cache interference is reduced. This has a direct effect on scalability, especially on multiprocessor systems. On platforms where subpool is not available, generational GC can provide similar benefits.

Conclusion

This article has highlighted the different GC policy choices available in the IBM SDK 5.0 and some of their characteristics. The default policy is sufficient for most applications; however, in some situations, other policies perform better. I have described some general scenarios in which you should consider switching to optavgpause, gencon, or subpool. Measuring application performance when evaluating a policy is very important, and Part 2 demonstrates this evaluation process in more detail.

Tuesday, 16 October 2012

clone() and the Cloneable Interface in Java

Most of the methods defined by Object are discussed elsewhere in this book. However, one deserves special attention: clone( ). The clone( ) method generates a duplicate copy of the object on which it is called. Only classes that implement the Cloneable interface can be cloned.

The Cloneable interface defines no members. It is used to indicate that a class allows a bitwise copy of an object (that is, a clone) to be made. If you try to call clone( ) on a class that does not implementCloneable, a CloneNotSupportedException is thrown. When a clone is made, the constructor for the object being cloned is not called. A clone is simply an exact copy of the original.

Cloning is a potentially dangerous action, because it can cause unintended side effects. For example, if the object being cloned contains a reference variable called obRef, then when the clone is made, obRefin the clone will refer to the same object as does obRef in the original. If the clone makes a change to the contents of the object referred to by obRef, then it will be changed for the original object, too. Here is another example. If an object opens an I/O stream and is then cloned, two objects will be capable of operating on the same stream. Further, if one of these objects closes the stream, the other object might still attempt to write to it, causing an error.

Because cloning can cause problems, clone( ) is declared as protected inside Object. This means that it must either be called from within a method defined by the class that implements Cloneable, or it must be explicitly overridden by that class so that it is public. Let's look at an example of each approach.

The following program implements Cloneable and defines the method cloneTest( ), which calls clone( ) in Object:

// Demonstrate the clone() method.
class TestClone implements Cloneable {
int a;
double b;
// This method calls Object's clone().
TestClone cloneTest() {
try {
// call clone in Object.
return (TestClone) super.clone();
} catch(CloneNotSupportedException e) {
System.out.println("Cloning not allowed.");
return this;
}
}
}

class CloneDemo {
public static void main(String args[]) {
TestClone x1 = new TestClone();
TestClone x2;
x1.a = 10;
x1.b = 20.98;
x2 = x1.cloneTest(); // clone x1
System.out.println("x1: " + x1.a + " " + x1.b);
System.out.println("x2: " + x2.a + " " + x2.b);
}
}

Here, the method cloneTest( ) calls clone( ) in Object and returns the result. Notice that the object returned by clone( ) must be cast into its appropriate type (TestClone). The following example overrides clone( ) so that it can be called from code outside of its class. To do this, its access specifier must be public, as shown here:

// Override the clone() method.
class TestClone implements Cloneable {
int a;
double b;
// clone() is now overridden and is public.
public Object clone() {
try {
// call clone in Object.
return super.clone();
} catch(CloneNotSupportedException e) {
System.out.println("Cloning not allowed.");
return this;
}
}
}

class CloneDemo2 {
public static void main(String args[]) {
TestClone x1 = new TestClone();
TestClone x2;
x1.a = 10;
x1.b = 20.98;
// here, clone() is called directly.
x2 = (TestClone) x1.clone();
System.out.println("x1: " + x1.a + " " + x1.b);
System.out.println("x2: " + x2.a + " " + x2.b);
}
}

The side effects caused by cloning are sometimes difficult to see at first. It is easy to think that a class is safe for cloning when it actually is not. In general, you should not implement Cloneable for any class without good reason.

Monday, 8 October 2012

Analytic functions by Example

This article provides a clear, thorough concept of analytic functions and its various options by a series of simple yet concept building examples. The article is intended for SQL coders, who for might be not be using analytic functions due to unfamiliarity with its cryptic syntax or uncertainty about its logic of operation. Often I see that people tend to reinvent the feature provided by analytic functions by native join and sub-query SQL. This article assumes familiarity with basic Oracle SQL, sub-query, join and group function from the reader. Based on that familiarity, it builds the concept of analytic functions through a series of examples.

It is true that whatever an analytic function does can be done by native SQL, with join and sub-queries. But the same routine done by analytic function is always faster, or at least as fast, when compared to native SQL. Moreover, I am not considering here the amount of time that is spent in coding the native SQLs, testing, debugging and tuning them.

The general syntax of analytic function is:

Function(arg1,..., argn) OVER ( [PARTITION BY <...>] [ORDER BY <....>] [<window_clause>] )

<window_clause> is like "ROW <?>" or "RANK <?>"
All the keywords will be dealt in details as we walk through the examples. The script for creating the schema (SCOTT) on which the example queries of this article are run can be obtained in ORACLE_HOME/sqlplus/demo/demobld.sql of any standard Oracle installation.

How are analytic functions different from group or aggregate functions?

SELECT deptno,
COUNT(*) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30)
GROUP BY deptno;

DEPTNO                 DEPT_COUNT             
---------------------- ---------------------- 
20                     5                      
30                     6                      

2 rows selected

Query-1

Consider the Query-1 and its result. Query-1 returns departments and their employee count. Most importantly it groups the records into departments in accordance with the GROUP BY clause. As such any non-"group by" column is not allowed in the select clause.

SELECT empno, deptno, 
COUNT(*) OVER (PARTITION BY 
deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30);

     EMPNO     DEPTNO DEPT_COUNT
---------- ---------- ----------
      7369         20          5
      7566         20          5
      7788         20          5
      7902         20          5
      7876         20          5
      7499         30          6
      7900         30          6
      7844         30          6
      7698         30          6
      7654         30          6
      7521         30          6

11 rows selected.

Query-2

Now consider the analytic function query (Query-2) and its result. Note the repeating values of DEPT_COUNT column.

This brings out the main difference between aggregate and analytic functions. Though analytic functions give aggregate result they do not group the result set. They return the group value multiple times with each record. As such any other non-"group by" column or expression can be present in the select clause, for example, the column EMPNO inQuery-2.

Analytic functions are computed after all joins, WHERE clause, GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query operates after the analytic functions. So analytic functions can only appear in the select list and in the main ORDER BY clause of the query.

In absence of any PARTITION or <window_clause> inside the OVER( ) portion, the function acts on entire record set returned by the where clause. Note the results of Query-3 and compare it with the result of aggregate function query Query-4.

SELECT empno, deptno, 
COUNT(*) OVER ( ) CNT
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, 1;

     EMPNO     DEPTNO        CNT
---------- ---------- ----------
      7782         10          8
      7839         10          8
      7934         10          8
      7369         20          8
      7566         20          8
      7788         20          8
      7876         20          8
      7902         20          8

Query-3

SELECT COUNT(*) FROM emp
WHERE deptno IN (10, 20);

  COUNT(*)
----------
         8

Query-4

How to break the result set in groups or partitions?

It might be obvious from the previous example that the clause PARTITION BY is used to break the result set into groups. PARTITION BY can take any non-analytic SQL expression.

Some functions support the <window_clause> inside the partition to further limit the records they act on. In the absence of any <window_clause> analytic functions are computed on all the records of the partition clause.

The functions SUM, COUNT, AVG, MIN, MAX are the common analytic functions the result of which does not depend on the order of the records.

Functions like LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE depends on order of records. In the next example we will see how to specify that.

How to specify the order of the records in the partition?

The answer is simple, by the "ORDER BY" clause inside the OVER( ) clause. This is different from the ORDER BY clause of the main query which comes after WHERE. In this section we go ahead and introduce each of the very useful functions LEAD, LAG, RANK, DENSE_RANK, ROW_NUMBER, FIRST, FIRST VALUE, LAST, LAST VALUE and show how each depend on the order of the record.

The general syntax of specifying the ORDER BY clause in analytic function is:

ORDER BY <sql_expr> [ASC or DESC] NULLS [FIRST or LAST]

The syntax is self-explanatory.

ROW_NUMBER, RANK and DENSE_RANK

All the above three functions assign integer values to the rows depending on their order. That is the reason of clubbing them together.

ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers. In Query-5, the function ROW_NUMBER( ) is used to give separate sets of running serial to employees of departments 10 and 20 based on their HIREDATE.

SELECT empno, deptno, hiredate,
ROW_NUMBER( ) OVER (PARTITION BY
deptno ORDER BY hiredate
NULLS LAST) SRLNO
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, SRLNO;

EMPNO  DEPTNO HIREDATE       SRLNO
------ ------- --------- ----------
  7782      10 09-JUN-81          1
  7839      10 17-NOV-81          2
  7934      10 23-JAN-82          3
  7369      20 17-DEC-80          1
  7566      20 02-APR-81          2
  7902      20 03-DEC-81          3
  7788      20 09-DEC-82          4
  7876      20 12-JAN-83          5

8 rows selected.

Query-5 (ROW_NUMBER example)

RANK and DENSE_RANK both provide rank to the records based on some column value or expression. In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.

Query-6 shows the usage of both RANK and DENSE_RANK. For DEPTNO 20 there are two contenders for the first position (EMPNO 7788 and 7902). Both RANK and DENSE_RANK declares them as joint toppers. RANK skips the next value that is 2 and next employee EMPNO 7566 is given the position 3. For DENSE_RANK there are no such gaps.

SELECT empno, deptno, sal,
RANK() OVER (PARTITION BY deptno
ORDER BY sal DESC NULLS LAST) RANK,
DENSE_RANK() OVER (PARTITION BY
deptno ORDER BY sal DESC NULLS
LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;

EMPNO  DEPTNO   SAL  RANK DENSE_RANK
------ ------- ----- ----- ----------
  7839      10  5000     1          1
  7782      10  2450     2          2
  7934      10  1300     3          3
  7788      20  3000     1          1
  7902      20  3000     1          1
  7566      20  2975     3          2
  7876      20  1100     4          3
  7369      20   800     5          4

8 rows selected.

Query-6 (RANK and DENSE_RANK example)

LEAD and LAG

LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:

LEAD (<sql_expr>, <offset>, <default>) OVER (<analytic_clause>)

<sql_expr> is the expression to compute from the leading row.
<offset> is the index of the leading row relative to the current row.
<offset> is a positive integer with default 1.
<default> is the value to return if the <offset> points to a row outside the partition range.

The syntax of LAG is similar except that the offset for LAG goes into the previous rows.
Query-7 and its result show simple usage of LAG and LEAD function.

SELECT deptno, empno, sal,
LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,
LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, sal DESC;

 DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL
------- ------ ----- -------------- ---------------
     10   7839  5000           2450               0
     10   7782  2450           1300            5000
     10   7934  1300              0            2450
     20   7788  3000           3000               0
     20   7902  3000           2975            3000
     20   7566  2975           1100            3000
     20   7876  1100            800            2975
     20   7369   800              0            1100

8 rows selected.

Query-7 (LEAD and LAG)

FIRST VALUE and LAST VALUE function

The general syntax is:

FIRST_VALUE(<sql_expr>) OVER (<analytic_clause>)

The FIRST_VALUE analytic function picks the first record from the partition after doing the ORDER BY. The <sql_expr> is computed on the columns of this first record and results are returned. The LAST_VALUE function is used in similar context except that it acts on the last record of the partition.

-- How many days after the first hire of each department were the next
-- employees hired?

SELECT empno, deptno, hiredate ? FIRST_VALUE(hiredate)
OVER (PARTITION BY deptno ORDER BY hiredate) DAY_GAP
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, DAY_GAP;

     EMPNO     DEPTNO    DAY_GAP
---------- ---------- ----------
      7369         20          0
      7566         20        106
      7902         20        351
      7788         20        722
      7876         20        756
      7499         30          0
      7521         30          2
      7698         30         70
      7844         30        200
      7654         30        220
      7900         30        286

11 rows selected.

Query-8 (FIRST_VALUE)

FIRST and LAST function

The FIRST function (or more properly KEEP FIRST function) is used in a very special situation. Suppose we rank a group of record and found several records in the first rank. Now we want to apply an aggregate function on the records of the first rank. KEEP FIRST enables that.

The general syntax is:

Function( ) KEEP (DENSE_RANK FIRST ORDER BY <expr>) OVER (<partitioning_clause>)

Please note that FIRST and LAST are the only functions that deviate from the general syntax of analytic functions. They do not have the ORDER BY inside the OVER clause. Neither do they support any <window> clause. The ranking done in FIRST and LAST is always DENSE_RANK. The query below shows the usage of FIRST function. The LAST function is used in similar context to perform computations on last ranked records.

-- How each employee's salary compare with the average salary of the first
-- year hires of their department?

SELECT empno, deptno, TO_CHAR(hiredate,'YYYY') HIRE_YR, sal,
TRUNC(
AVG(sal) KEEP (DENSE_RANK FIRST
ORDER BY TO_CHAR(hiredate,'YYYY') )
OVER (PARTITION BY deptno)
     ) AVG_SAL_YR1_HIRE
FROM emp
WHERE deptno IN (20, 10)
ORDER BY deptno, empno, HIRE_YR;

     EMPNO     DEPTNO HIRE        SAL AVG_SAL_YR1_HIRE
---------- ---------- ---- ---------- ----------------
      7782         10 1981       2450             3725
      7839         10 1981       5000             3725
      7934         10 1982       1300             3725
      7369         20 1980        800              800
      7566         20 1981       2975              800
      7788         20 1982       3000              800
      7876         20 1983       1100              800
      7902         20 1981       3000              800

8 rows selected.

Query-9 (KEEP FIRST)

How to specify the Window clause (ROW type or RANGE type windows)?

Some analytic functions (AVG, COUNT, FIRST_VALUE, LAST_VALUE, MAX, MIN and SUM among the ones we discussed) can take a window clause to further sub-partition the result and apply the analytic function. An important feature of the windowing clause is that it is dynamic in nature.

The general syntax of the <window_clause> is

[ROW or RANGE] BETWEEN <start_expr> AND <end_expr>

<start_expr> can be any one of the following

UNBOUNDED PECEDING
CURRENT ROW
<sql_expr> PRECEDING or FOLLOWING.

UNBOUNDED FOLLOWING or
CURRENT ROW or
<sql_expr> PRECEDING or FOLLOWING.

For ROW type windows the definition is in terms of row numbers before or after the current row. So for ROW type windows <sql_expr> must evaluate to a positive integer.

For RANGE type windows the definition is in terms of values before or after the current ORDER. We will take this up in details latter.

The ROW or RANGE window cannot appear together in one OVER clause. The window clause is defined in terms of the current row. But may or may not include the current row. The start point of the window and the end point of the window can finish before the current row or after the current row. Only start point cannot come after the end point of the window. In case any point of the window is undefined the default is UNBOUNDED PRECEDING for <start_expr> and UNBOUNDED FOLLOWING for <end_expr>.

If the end point is the current row, syntax only in terms of the start point can be can be
[ROW or RANGE] [<start_expr> PRECEDING or UNBOUNDED PRECEDING ]

[ROW or RANGE] CURRENT ROW is also allowed but this is redundant. In this case the function behaves as a single-row function and acts only on the current row.

ROW Type Windows

For analytic functions with ROW type windows, the general syntax is:

Function( ) OVER (PARTITIN BY <expr1> ORDER BY <expr2,..> ROWS BETWEEN <start_expr> AND <end_expr>)
or
Function( ) OVER (PARTITON BY <expr1> ORDER BY <expr2,..> ROWS [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

For ROW type windows the windowing clause is in terms of record numbers.

The query Query-10 has no apparent real life description (except column FROM_PU_C) but the various windowing clause are illustrated by a COUNT(*) function. The count simply shows the number of rows inside the window definition. Note the build up of the count for each column for the YEAR 1981.

The column FROM_P3_TO_F1 shows an example where start point of the window is before the current row and end point of the window is after current row. This is a 5 row window; it shows values less than 5 during the beginning and end.

-- The query below has no apparent real life description (except 
-- column FROM_PU_C) but is remarkable in illustrating the various windowing
-- clause by a COUNT(*) function.
 
SELECT empno, deptno, TO_CHAR(hiredate, 'YYYY') YEAR,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) FROM_P3_TO_F1,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM_PU_TO_C,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING) FROM_P2_TO_P1,
COUNT(*) OVER (PARTITION BY TO_CHAR(hiredate, 'YYYY')
ORDER BY hiredate ROWS BETWEEN 1 FOLLOWING AND 3 FOLLOWING) FROM_F1_TO_F3
FROM emp
ORDEDR BY hiredate

 EMPNO  DEPTNO YEAR FROM_P3_TO_F1 FROM_PU_TO_C FROM_P2_TO_P1 FROM_F1_TO_F3
------ ------- ---- ------------- ------------ ------------- -------------
  7369      20 1980             1            1             0             0
  <font bgcolor=yellow>7499      30 1981             2            1             0             3
  7521      30 1981             3            2             1             3
  7566      20 1981             4            3             2             3
  7698      30 1981             5            4             3             3
  7782      10 1981             5            5             3             3
  7844      30 1981             5            6             3             3
  7654      30 1981             5            7             3             3
  7839      10 1981             5            8             3             2
  7900      30 1981             5            9             3             1
  7902      20 1981             4           10             3             0</font>
  7934      10 1982             2            1             0             1
  7788      20 1982             2            2             1             0
  7876      20 1983             1            1             0             0

14 rows selected.

Query-10 (ROW type windowing example)

The column FROM_PU_TO_CURR shows an example where start point of the window is before the current row and end point of the window is the current row. This column only has some real world significance. It can be thought of as the yearly employee build-up of the organization as each employee is getting hired.

The column FROM_P2_TO_P1 shows an example where start point of the window is before the current row and end point of the window is before the current row. This is a 3 row window and the count remains constant after it has got 3 previous rows.

The column FROM_F1_TO_F3 shows an example where start point of the window is after the current row and end point of the window is after the current row. This is a reverse of the previous column. Note how the count declines during the end.

RANGE Windows

For RANGE windows the general syntax is same as that of ROW:

Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE BETWEEN <start_expr> AND <end_expr>)
or
Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2> RANGE [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

For <start_expr> or <end_expr> we can use UNBOUNDED PECEDING, CURRENT ROW or <sql_expr> PRECEDING or FOLLOWING. However for RANGE type windows <sql_expr> must evaluate to value compatible with ORDER BY expression <expr1>.

<sql_expr> is a logical offset. It must be a constant or expression that evaluates to a positive numeric value or an interval literal. Only one ORDER BY expression is allowed.

If <sql_expr> evaluates to a numeric value, then the ORDER BY expr must be a NUMBER or DATE datatype. If <sql_expr> evaluates to an interval value, then the ORDER BY expr must be a DATE datatype.

Note the example (Query-11) below which uses RANGE windowing. The important thing here is that the size of the window in terms of the number of records can vary.

-- For each employee give the count of employees getting half more that their 
-- salary and also the count of employees in the departments 20 and 30 getting half 
-- less than their salary.
 
SELECT deptno, empno, sal,
Count(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN UNBOUNDED PRECEDING AND (sal/2) PRECEDING) CNT_LT_HALF,
COUNT(*) OVER (PARTITION BY deptno ORDER BY sal RANGE
BETWEEN (sal/2) FOLLOWING AND UNBOUNDED FOLLOWING) CNT_MT_HALF
FROM emp
WHERE deptno IN (20, 30)
ORDER BY deptno, sal

 DEPTNO  EMPNO   SAL CNT_LT_HALF CNT_MT_HALF
------- ------ ----- ----------- -----------
     20   7369   800           0           3
     20   7876  1100           0           3
     20   7566  2975           2           0
     20   7788  3000           2           0
     20   7902  3000           2           0
     30   7900   950           0           3
     30   7521  1250           0           1
     30   7654  1250           0           1
     30   7844  1500           0           1
     30   7499  1600           0           1
     30   7698  2850           3           0

11 rows selected.

Query-11 (RANGE type windowing example)

Order of computation and performance tips

Defining the PARTITOIN BY and ORDER BY clauses on indexed columns (ordered in accordance with the PARTITION CLAUSE and then the ORDER BY clause in analytic function) will provide optimum performance. For Query-5, for example, a composite index on (deptno, hiredate) columns will prove effective.

It is advisable to always use CBO for queries using analytic functions. The tables and indexes should be analyzed and optimizer mode should be CHOOSE.

Even in absence of indexes analytic functions provide acceptable performance but need to do sorting for computing partition and order by clause. If the query contains multiple analytic functions, sorting and partitioning on two different columns should be avoided if they are both not indexed.

Conclusion

The aim of this article is not to make the reader try analytic functions forcibly in every other complex SQL. It is meant for a SQL coder, who has been avoiding analytic functions till now, even in complex analytic queries and reinventing the same feature much painstakingly by native SQL and join query. Its job is done if such a person finds analytic functions clear, understandable and usable after going through the article, and starts using them.

Hunter

Sunday, 8 December 2013

Java useful command

Thursday, 5 December 2013

Windows useful commands and default directory structure common software.

Tuesday, 26 November 2013

Most used Commands and Standered file directory path in Linux

To modify host name and IP address in Linux

To delete user in Linux

Monday, 16 September 2013

12 Little Known Laws of Karma (That Will Change Your Life)

Monday, 25 February 2013

Java technology, IBM style: Garbage collection policies, Part 1

About the series

Some terms defined

Signs of heap fragmentation

Mark and sweep

Mutator vs. GC threads

Tuesday, 16 October 2012

clone() and the Cloneable Interface in Java

Monday, 8 October 2012

Analytic functions by Example

How are analytic functions different from group or aggregate functions?

How to break the result set in groups or partitions?

How to specify the order of the records in the partition?

ROW_NUMBER, RANK and DENSE_RANK

LEAD and LAG

FIRST VALUE and LAST VALUE function

FIRST and LAST function

How to specify the Window clause (ROW type or RANGE type windows)?

ROW Type Windows

RANGE Windows

Order of computation and performance tips

Conclusion

About Me

Blog Archive