向 Spring 官方提交了一个 Issue

警告
本文最后更新于 2020-12-11,文中内容可能已过时。

昨天给spring提了一个issue,这个问题在 声明式事务注意事项 中有描述过。

关于spring声明式事务传播行为 REQUIRES_NEW,如果不做额外处理,可能会导致所有数据库连接被占用的问题。一直搞不明白,这里隐藏着很大的问题,为什么要有这个传播行为。于是就将问题提交给了spring官方。

issue链接传送门

当多个线程争用资源时,每个线程必须在结束任务之前抢占多个资源,这可能是一个隐患。因为此方法可能导致多个线程仅抢夺它们需要的部分资源,它们无法获得所需要的所有资源而释放所获得的东西。但是,到这个时候,资源已经用尽了。

spring声明式事务中的传播行为 REQUIRES_NEW 正是以上情况的表现。

最终的沟通结果截图:

沟通结果截图
沟通结果截图

详细的沟通过程:

最初提交的issue,我仅仅发了一段伪代码。内容如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
A method that opens a transaction calls another method that starts a new transaction,if all connections are exhausted before the last new transaction method is executed,then all threads in the process will block,this process will fail.

Pseudo code:

@Transactional
methodA(){
    // All the threads that started the transaction were executed here, but the connection was exhausted.
    // The latter method execution will get a new connection,but it will never get it.
    @Transactional(propagation=Propagation.REQUIRES_NEW)
    methodB(){
    }
}

刚开始对方并没有认真的查看,仅回复What's the problem?。之后数次的回复貌似不在一个频道。也许是因为沟通时礼貌的原因、又或者是我问题描述的不够清楚。经过几次沟通后,差点把对方惹毛了,对方回复道:这里不是论坛,如果您认为是错误,请粘贴测试用例,如果您认为是改进,请提交PR,讨论到此为止。

以下是原文:

1
It's not forum here, if you think it's a bug please paste test case, if you think it's an improvement please submit PR, end of discussion.

看到对方不耐烦了,考虑到自己说话方式的问题,一番思虑后,决定再回复一下。回复内容尽可能将这个潜在问题描述清楚。 回复如下:

1
Sorry, I didn't make it clear. This is the test code, where it blocks the real process. Methodb() will never execute.

百度翻译:抱歉,是我没有表达清楚。这是测试代码,它阻塞了整个进程。 methodB() 将永远不会执行。

示例代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
@slf4j
@service
public class PropagationExampleServiceImpl implements PropagationExampleService {
@Autowired
PropagationExampleService service;

@Transactional
@Override
public void methodA() throws InterruptedException {
	Thread.sleep(100);
	log.info("methodA");
	service.methodB();
}

@Transactional(propagation = Propagation.REQUIRES_NEW)
@Override
public void methodB() throws InterruptedException {
	log.info("methodB");
	Thread.sleep(100);
}
}

@SpringBootTest
public class PropagationExampleServiceTest {
@Autowired
PropagationExampleService service;
ExecutorService executor = new ThreadPoolExecutor(10, 10, 60,
TimeUnit.SECONDS, new ArrayBlockingQueue<>(1000));

@Test
public void methodATest() {
	for (int i = 0; i < 20; i++) {
		executor.execute(() -> {
			try {
				service.methodA();
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
		});
	}
	LockSupport.park();
}
}

连接池配置及日志:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
spring:
datasource:
name: druidDataSource
type: com.alibaba.druid.pool.DruidDataSource
druid:
max-active: 10

Here is the log:
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-1] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-2] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-5] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-6] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-8] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-9] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-3] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [ool-1-thread-10] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-7] c.j.s.s.i.PropagationExampleServiceImpl : methodA
2020-12-11 10:15:56.385 INFO 10864 --- [pool-1-thread-4] c.j.s.s.i.PropagationExampleServiceImpl : methodA

好吧,对方以为是我使用过程遇到了问题,在像他求助,回复道:

1
2
I think the result is expected, you should increase max-active > concurrency, so If you change max-active to 21, it will pass, waiting for your confirmation.
You should set a max-wait on dataSource, then getConnection will timeout instead of endless blocking.

百度翻译:我认为结果是预期的,您应该增加 max-active 并发性,因此,如果将max-active更改为21,它将通过,等待确认。您应该在dataSource上设置一个max-wait,然后getConnection将超时而不是无限阻塞。


以下是我的回复:

1
2
3
4
5
I know, I just want to express that. In fact, the number of connections in the connection pool is generally not large. In the above example, only one layer is nested. If there are multiple layers of newly started transactions in the code, only a small amount of concurrency is needed to trigger the above result.

I just think that when multiple threads compete for resources, each thread must grab multiple resources before ending the task, which may be a hidden danger.

Because this method may cause multiple threads to rob only a part of the resources they need. They can't release what they have acquired without getting all the resources they need. By this time, however, resources have been exhausted.

百度翻译:我知道,我只想表达这一点。实际上,连接池中的连接数通常不大。在上面的示例中,仅嵌套了一层。如果代码中有多层新启动的事务,则只需少量的并发即可触发上述结果。我只是认为,当多个线程争用资源时,每个线程必须在结束任务之前抢占多个资源,这可能是一个隐患。因为此方法可能导致多个线程仅抢夺它们需要的部分资源。他们无法获得所需要的所有资源而释放所获得的东西。但是,到这个时候,资源已经用尽了。


对方回复:

1
2
Your test blocked because you didn't set timeout for ds.getConnection(), for example maxWait for druid and connectionTimeout for HikariCP.
It's really nothing to do with spring transaction.

百度翻译:由于未为ds.getConnection()设置超时(例如,为druid设置了maxWait,为HikariCP设置了connectionTimeout),因此测试被阻止。确实与spring transaction无关。

对方的回复让我觉得有点甩锅的意思。😢

最终决定,把我的意图整体发过去,然后就不再纠结了。

1
2
3
4
5
6
7
8
9
I'm sorry to have taken you so long.

I just want to put forward what I think may be hidden dangers. It's not that I have encountered problems in use. I just don't quite understand why this place is designed in this way.

Even if Max wait is set here, the above situation may occur. It's just going to wait for a while, not the whole process, but it's bad enough.

For this reason, I choose to avoid using REQUIRES_NEW, if I need to start a new transaction in a method, I will use asynchronous or transactionSynchronization.afterCommit ()。

You may say that the outer method will not be able to sense the exception, but why should it? These are two transactions. If I need the outer layer to sense the inner layer's anomaly, I can use NESTED.

百度翻译: 很抱歉耽误你这么久。

我只想提出我认为可能存在的隐患。并不是说我在使用中遇到了问题。我只是不太明白为什么这个地方是这样设计的。

即使在这里设置了Max wait,也可能发生上述情况。只是要等一段时间,而不是整个进程挂掉,但这也足够糟糕了。

因此,我会选择尽可能避免使用REQUIRES_NEW,如果我必须在一个方法中启动一个新事务,我将使用asynchronous或transaActionSynchronization.afterCommit()将两个事务提交隔离。

您可能会说,外部方法将无法感知异常。但为什么要这样做呢?这本就是两个事务。如果我需要外层来感知内层的异常,我可以使用NESTED。


最终对方给了一个我意想不到的回复:

1
2
I admit your title "Propagation.REQUIRES_NEW May cause all connections to be occupied".
application should deal REQUIRES_NEW properly, there is no bug or improve space here, at most adding note in docs. Do you agree?

百度翻译:我承认您的标题“ Propagation.REQUIRES_NEW可能导致所有连接被占用”。应用程序应该正确处理REQUIRES_NEW,此处没有错误或没有优化空间,最多只能在文档中添加注释。你同意吗?


如果项目中有事务的传播行为设置为 REQUIRES_NEW,一定要仔细思考当前这个节点,是否真的需要新启事务,或者如果打乱外围方法和它之间的执行顺序,是否会有所影响?考虑这些问题的目的就是为了,尽可能避免这种传播行为的使用。如果真的避不开,就做异步处理或者使用事务同步器,进行后置处理。

如果这种方式还是会打乱原有业务,必须得使用 REQUIRES_NEW,也不允许做其他处理(我实在想不到这种场景),那最好对其进行限流。理论上进程中所有新启事务的方法个数要小于连接池连接数。允许的最大并发量要小于连接池连接数除以这种方法的个数,当然这是下下策。最终这些方法会以极慢的速度进行,甚至还不如直接塞到一个事务中。能拆分成两个事务解决的问题,使用一个事务一定也是可以解决的。

我个人认为 REQUIRES_NEW 这种传播行为机制是有一定问题的,至少这个问题在上面代码案例中已经表现出来了,而且触发的几率极高。替代这种传播行为的手段很多,何必要直接使用这种传播行为,埋下一个项目随时可能宕机(如果设置了超时,前文中说的结果被触发时,会一直卡顿到连接超时,事务回滚。)的隐患呢?

之所以没有直接发 issue 截图,是因为文章使用md文档写的,提交到其他博客网站时,图片链接可能会无法加载。所以以文字的形式先复制下来。后面再附上部分截图。

github部分沟通时的截图

spring-github-issue-transaction-01
spring-github-issue-transaction-01

spring-github-issue-transaction-02
spring-github-issue-transaction-02

spring-github-issue-transaction-03
spring-github-issue-transaction-03

spring-github-issue-transaction-04
spring-github-issue-transaction-04